Transcript Folie 1

From Librametry to Webometrics

Hildrun Kretschmer 1 , Mike Thelwall 2 1 Nerdi, NIWI, The Royal Netherlands Academy of Arts and Sciences, Amsterdam, The Netherlands [email protected]

2 School of Computing and Information Technology, University of Wolverhampton, UK [email protected]

Abstract

The development of information and library sciences together with science studies will, among other things, be fashioned by the development of quantitative studies conducted in this field. The terminology thus obtained shall be perceived as a reflection of the technical, social and political backgrounds of the researchers.

The technical redevelopment of methods of communication through the Internet presents a challenge for information scientists to cultivate novel quantitative methods and techniques in order to measure rates of information exchange in this new medium.

(This paper was presented at the National Seminar on Information Professionals for the Digital Era, 29 th -30 th January 2003, Chennai, India, and published in the Proceedings, MALA, Madras Library Association, edited by A. Amudhavalli, 2003)

Librametry/Librametrics versus Bibliometrics

Some fifty years ago, at the Aslib Conference in Leamington Spa in 1948, the term ‘Librametry’ was established by the famous Indian librarian S. R. Ranganathan: “...it is necessary for librarians to develop

librametry

in the lines of biometry, econometry and psychometry since

many of the matters connected with library work and services involve large numbers.

..

” (Aslib Proceedings 1949, p. lO2). His suggestions were avidly welcomed at the conference, notably also by Bernal.

In preparation for this great event, Ranganathan in the course of the preceding 20 years, i.e. since 1925, had already successfully practised the application of ‘Elements of Statistical Calculus’ to library problems. The success accomplished had inspired him to introduce publicly the term ‘Librametry’ to the above-mentioned conference (Gopinath, 1992). For many years Ranganathan had worked in the library of the University of Madras.

Within the sphere of libraries, librametry was applied to a wide range of problems, performed by Ranganathan and other Indian scientists. The following uses below account only for some of them (Ranganathan, 1996, 1995 reprinted).

 Librametry in the day-to-day work of a library  Librametry in organising national or state library systems  Operations research in library work  Librametry and book selection  Librametry and classification  Library administration  Library services

In the first years following Ranganathan's achievements there were hardly any responses to his work in the western world. As a result, more than 20 years after the introduction of the term ‘Librametry’ - sometimes also called ‘Librametrics’ - A. Prichard, the information scientist, in 1969 coined the term a redevelopment of the term ‘Bibliometrics’ as ‘Statistical Bibliography’ for the quantitative analysis of bibliographies (Sen, 1995).

This term was very rapidly accepted internationally, ‘Bibliometrics’ being later also accepted by Indian librarians.

with

However, at a later time, the well-known English information scientist B. C. Brookes pointed out regretfully (l990, p. 41): "Had I known of Ranganathan's term in time, I would have adopted

librametrics

information for information

science studies

and

bibliometrics

for . But it was too late. Librarians liked bibliometrics too".

Brookes (1990) attached importance to a distinction to be made between Information

Science

and Information

Studies

, with the former term apparently being more theory-oriented and the latter term more application-oriented, i.e. going along the lines of the use of techniques with a view to optimising Library Administrations and Library Services.

Bibliometrics versus Scientometrics/Informetrics

Looking at the historical development in our field it becomes obvious that for some years the term ‘Bibliometrics’ has apparently been suffering a similar fate to the term ‘Librametry’. It seems as if the term ‘Bibliometrics’ has become subordinate to the term ‘Informetrics’.

Leo Egghe and Ronald Rousseau in 1987 heralded the advent of the henceforth biannually held international conferences with the ‘First International Conference on

Bibliometrics

and Theoretical Aspects of Information Retrieval’, which took place in Diepenbeek, Belgium. However, Jean Tague, in organizing her conference, went beyond the term

‘Bibliometrics’

and gave an enlarged name to the subsequently held international conference: ‘Second International Conference on

Bibliometrics, Scientometrics and Informetrics

’. This Conference was held in l989 in London, Ontario, Canada.

The underlying reason was that, independently of the other two terms mentioned above, the term

‘Scientometrics’

had been coined by Vassily V. Nalimov in Russia at the end of the 60s (Hood & Wilson, 2001) and had later been extended to Hungary, The Netherlands and to Spain. The term Scientometrics is said to encompass all quantitative aspects of studies in the field of science of science, communication in science and science policy.

The supporters of ‘Scientometrics’ had taken the view that their subject matter could not be covered by the term ‘Bibliometrics’.

Hence, we had to face the above enlargement of terms in the title of the Second International Conference.

It was interesting to see that the Proceedings of the Second International Conference involved a surprise. The editors Leo Egghe and Ronald Rousseau “...strongly endorse the use of the term ‘informetrics’...” (Informetrics 89/90, p.V.). They took the view that ‘Bibliometrics’ and ‘Scientometrics’ were both subordinate to the term ‘Informetrics’. Accordingly, the subsequent conference held in Bangalore, India, in 1991, was given the title ‘Third International Conference on

Informetrics’

(Chair: I.K. Ravichandra Rao).

During the preparations for the next conference, scheduled to be held in Berlin, Germany in l993 (Chair: Hildrun Kretschmer) serious problems had arisen at a preceding conference held in Leiden, The Netherlands. At the Leiden conference the theme was: ‘Science and Technology Indicators - Evaluation in Science and Technology’. Understandably, a major part of the participants had come from Science Policy. Officially, the idea was to exclude ‘Informetrics’ from the title, since the subject ‘Evaluation’ could not be logically subordinated to ‘Informetrics’. However, the Berlin conference was held under the title ‘Fourth International Conference on

Bibliometrics, Informetrics and Scientometrics

’.

Generally speaking, the participants could be subdivided into two distinct scientific groups, with members of both groups representing fairly separate scientific interests. While the members of ‘Bibliometrics/Informetrics’ had come from libraries or information and documentation centres, the ‘Scientometricians’ had their homes in centres of science studies or sociological institutions. The first mentioned group was primarily concerned with “classical bibliometric laws, growth, principles underlying informetric distributions or the non-Gaussian nature of these distributions, etc.”, whereas the second group was substantially concerned with science-policy-oriented subjects, such as evaluation of science and technology. By contrast, both groups were united in their common approach to using the same identifiable objects, such as the number of publications, number of quotations, publications, etc. as the points of departure for their empirical studies.

For all these persistently recurring problems, the development of the subject matter had meanwhile reached a stage that required the foundation of an international society at the Berlin conference in l993. The result of discussions with Invited Speakers at the Plenary Session on Bibliometrics, Informetrics and “Bridging the Gaps between Scientometrics” at the constituent assembly of the society was to give the Society the title “International Society for Scientometrics and Informetrics (I.S.S.I.

)”.

Henceforth, i.e. from l995 on in Chicago, U.S.A. (Chair: Michael Koenig) the biannually held international conferences were termed ‘International Conference on

Scientometrics and Informetrics

’. Thus the term

'Bibliometrics'

used at the first international conference in Belgium in l987 has disappeared!

Clearly, there was a certain measure of agreement that ‘Bibliometrics’ could be subordinate to ‘Informetrics’. By the way, however, the term ‘Bibliometrics’, in contrast to ‘Librametry/Librametrics’, continues to be used in research.

For all these persistently recurring problems, the development of the subject matter had meanwhile reached a stage that required the foundation of an international society at the Berlin conference in l993.

The result of discussions with Invited Speakers at the Plenary Session on “Bridging the Gaps Informetrics (I.S.S.I.)”.

between ‘International Conference on Bibliometrics, Informetrics and Scientometrics” at the constituent assembly of the society was to give the Society the title “International Society for Scientometrics and Henceforth, i.e. from l995 on in Chicago, U.S.A. (Chair: Michael Koenig) the biannually held international conferences were termed

Scientometrics and Informetrics

’. Thus the term

'Bibliometrics'

used at the first international conference in Belgium in l987 has disappeared! Clearly, there was a certain measure of agreement that ‘Bibliometrics’ could be subordinate to ‘Informetrics’. By the way, however, the term ‘Bibliometrics’, in contrast to ‘Librametry/Librametrics’, continues to be used in research.

Apart from these political changes there are technical changes that have made an impact. The new information technology by way of the Internet has called many old techniques into question.

Will library services still be needed in their old form? In Europe, for instance, some library institutes and information centres are being closed down.

Questions will be raised whether or not the well-known old methodologies underlying librametrics, bibliometrics, informetrics and scientometrics will still be capable of serving their original purposes in the age of the Internet, or whether it is now time to devise new methods? Of course, this new development has to be taken into account in training the future generation.

Björneborn, L. & Ingwersen, P. (2001) point out that a new research field,

webometrics

, has emerged since the mid-1990s and furthermore that (p.65): “Webometrics displays several similarities to informetric and scientometric studies and the application of common bibliometric methods.” Therefore, the new approach to quantitative measurement in the Internet, webometrics, devised and established only recently in Europe and Israel, should be given great attention as a new European perspective.

Webometrics

Probably the biggest changes in methods for information exchange in the West over the past ten years have all been driven by the potential of the Internet. Digital libraries (Fox & Urs, 2002) are one visible incursion into the domain of librarians, and today much more general information is electronic and available over the Internet.

In the West, librarians seem to be keeping their traditional roles, perhaps in a reduced form, but moving into new areas, helping users to search for information from a much greater variety of sources, many online. Information science research has also changed, with much research into how the new technologies are being used, particularly email (Herring, 2002) and the web (Cronin, 2001; Kling & McKim, 2000).

In addition to user studies there have been attempts to extract new kinds of information from the web, for example examining the relationship between areas of the web by counting the number of hyperlinks between them (Ingwersen, 1998). These kinds of studies have grown out of bibliometric analyses of the citations in published journal articles (Vohora

et al

., 2001; Borgman, & Furner, 2002). In this paper we will give a brief overview of web link research and introduce a new European Union-funded project into this area, part of the emerging field of webometrics.

Web Hyperlink Research

Bibliometrics is the sub-area of informetrics for the quantitative study of aspects of documents. One of the main techniques used is to examine the relationships between academic journal articles through patterns that emerge from their reference sections. For example, if one article cites another then this will normally mean that either the content in the cited article has been found useful or relevant in some way (Cronin, 1984). Citation can also be used by mapping software to draw pictures of the relationships between articles, authors, journals or academic fields (Small, 1999).

Given the extensive use of citations, several information scientists have noted their similarity to hyperlinks in web pages and have sought to exploit this to extract new information from those links (Larson, 1996; Rodríguez Gairín, 1997; Almind & Ingwersen, 1997; Rousseau, 1997). The thesis of Larson (1996) for instance, was that the relationship between a set of web pages on a given topic could be visualised by plotting them together with the hyperlinks between them.

Recent research has focussed not upon individual hyperlinks but upon counting links to and from web sites or other areas of the Web. Ingwersen (1998) used the advanced search capabilities of the search engine AltaVista to count links to and from entire countries. AltaVista allows Boolean queries based upon words, links and domain names (http://www.altavista.com/web/adv).

For example the

link

command requests pages that link to the given URL or partial URL, and

host

requests pages that are hosted by the given domain name. The following example is a request for all web pages from the University of Delhi’s Institute of Informatics and Communication web site (http://www.iic.ac.in) that contain a link to a page in the National Institute of Science, Technology and Development Studies (http://www.nistads.res.in).

link:nistads.res.in and host:iic.ac.in

Note that the initial “www.” is omitted in case there are multiple domain names for the same site. This occurred in the example above, with the domain name euindia.iic.ac.uk, also belonging to the University of Delhi’s Institute of Informatics and Communication.

This gives the following results:

Figure 1.

A section from the AltaVista results showing the discovery of a page hosted by the University of Delhi’s Institute of Informatics and Communication that links to the National Institute of Science, Technology and Development Studies. The URL of the page is http://euindia.iic.ac.in/networkpartners.php

Figure 2.

A section from the linking page showing the link identified by AltaVista. The URL of the page is http://euindia.iic.ac.in/networkpartners.php

and the link to http://nistads.res.in/ is described at the bottom.

The Boolean query facility, and similar features offered by other search engines such as AllTheWeb.com, are new tools with which to map the web. Other researchers have also used a web crawler to collect data on hyperlinks (Thelwall, 2001a). This is a program that automatically fetches pages from the web and extracts their links.

The advantage of this approach is greater control over the process of finding and extracting links, but its disadvantage is that a research crawler can only hope to crawl a limited subset of the web.

It should be noted that the extent of use of the web varies by country. For example, in the UK, all universities have large web sites and these typically attract thousands of links from web pages from other universities. In India, however, not all universities have their own web sites yet (see http://www.imsc.ernet.in/webserv/servers.html) and the page and link counts for the set that we investigated were found to be very low. For example, the University of Delhi had 103 pages recorded by Google, and 267 links from the rest of the world (using the query: link:du.ac.in AND NOT host:du.ac.in).

See Figure 3 for a selection of results.

Figure 3.

A section from the AltaVista results showing the discovery of 267 pages that link to Delhi University. The first six pages with a matching link are desribed.

Hyperlinks and Informal Scholarly Communication

Journal article citations have been investigated by information scientists for well over 30 years, but there is still controversy over basic issues such as whether citation counts are useful to help measure research quality (Moed, 2002; Vohora

et al

., 2002). The same debate has now started for hyperlinks. What can counts of hyperlinks be usefully used to show?

If a web site has many links pointing to it, then is this a good indicator that it has high quality content? Also, can hyperlinks be used to trace online scholarly communication? A series of studies have given some insights into why hyperlinks are created in academic settings and whether they are related to scholarly communication. The data for these studies has come from the university Web sites of a country.

The first question asked was whether counting links to university or departmental web sites would be a valid measure of research impact, in the way that citation counts might be. Although early results were disappointing (Thelwall, 2000; Thomas & Willett, 2000) later studies found statistically significant correlations in both cases (Thelwall, 2001b; 2002a; Chu

et al

., 2002; Li

et al

., 2003; Tang & Thelwall, 2002). This gave evidence that hyperlinks bore some relationship to scholarly communication, although they were not necessarily caused directly by it. In fact academic research may or may not use the web and even if it does, may not leave a trace in the form of hyperlinks (Kling & McKim, 2000). A recent paper reported on a survey of the creation reasons for a random sample of 414 random hyperlinks between UK university web sites (Wilkinson

et al

., 2003).

It was found that whilst less than 1% were equivalent to journal citations, in terms of citing a refereed academic document, over 90% bore some relationship to scholarly activity. This is strong evidence that link counts are indicators of informal scholarly communication. An institution with high research productivity should naturally expect to have a high degree of informal scholarly communication, some of which will probably include the creating and attracting of hyperlinks.

Applications of Link Counts

Link counts for universities have been suggested as a (weak) proxy for university research quality in countries where there are no comparative figures available (Thelwall

et al

., 2001). They have also been suggested as alternative indicators of journal impact, both as a double-check on the Institute for Scientific Information’s (ISI) figures and to apply to journals not covered by them (Vaughan & Thelwall, 2003).

Links have also been used to track and analyse patterns of online scholarly communication within the European Union (EU) (Thelwall

et al

., 2003). Counts of links between EU universities were obtained from AltaVista and broken down by language of the linking page. It was found that English Language pages and links were very prominent throughout, accounting for approximately 50% in nearly all countries. In the UK and Eire, English accounted for almost all pages, whereas in Greece under 10%. From this the importance of English on the academic web was very clear , but also the fact that it was being used in tandem with national languages.

The WISER Project

The EU has recently financed a new consortium from England, The Netherlands and Spain to investigate further the potential to create new indicators from the web for use in science and technology policy making. This is a three-year project that started in November 2002 and is one possible direction for the future of information science research.

One of the main products of the project will be a manual for Web data use in indicator ‘Best practice research’. This will take the form of a publicly available web site and will present best practice recommendations. An initial version will soon be available, with a finished version produced at the end of the project. This will be an ideal resource for those wishing to start webometric investigations (www.webindicators.org).

It will give detailed recommendations on how to collect data, the various options for analysing and reporting it, and hints on how to interpret the final results. In addition to the production of the handbook, there will be several projects that investigate different aspects of the web, including links, gender aspects and the deep Web.

The link studies will also investigate colinks. Two web pages are colinked if they are both linked to by a third web page. Colinks are used by search engines to indicate similarity of page content: if two pages are colinked then they are more likely to be about the same subject than two that are not (Thelwall & Wilkinson, 2004). Cocitations are also used in bibliometrics to generate pictures of the relationships between authors (White & McCain, 1998) or fields (Small, 1999). The colink study will investigate whether useful tools for mapping science and technology on the web can be built from co links.

The gender studies will investigate whether there are differences in the way that male and female scientists and technologists are perceived on the web. This is likely to be a much more qualitative study because it would be very difficult to ascertain for each individual web page whether it was created by a man or woman. As a result of this, the link counting tools will be ineffective.

The deep Web, also known as the invisible web, is the name given to the set of web pages that are not indexed by search engines, perhaps because they are in an online database and a query must be typed to find them. The WISER project will investigate science and technology content in the deep Web and report on the importance of deep Web content. In summary, the WISER project will provide useful resources for those wishing to start Webometrics and information on the potential of a range of new Webometric techniques.

Acknowledgement

This work was supported by a grant from the Common Basis for Science, Technology and Innovation Indicators part of the Improving Human Research Potential specific programme of the Fifth Framework for Research and Technological Development of the European Commission.

It is part of the WISER project (Web indicators for scientific, technological and innovation research) (Contract HPV2-CT-2002-00015) (www.webindicators.org).

References

Almind, T. C. & Ingwersen, P. (1997). Informetric analyses on the world wide Web: methodological approaches to ‘Webometrics’.

Journal of Documentation

, 53(4) 404-426.

Björneborn, L. & Ingwersen, P. (2001). Perspectives of webometrics.

Scientometrics.

50 ( 1), 65 82 Borgman, C & Furner, J. (2002). Scholarly communication and bibliometrics. In: Cronin, B.

(ed.),

Annual Review of Information Science and Technology 36

, Medford, NJ: Information Today Inc., pp. 3-72.

Brookes, B.C. (1990). Biblio-, sciento-, infor-metrics??? What are we talking about? In: Egghe, L. & Rousseau, R. (Ed.).

Informetrics (89/90).

Amsterdam: Elsevier Science Publisher. 31-44 Chu, H., He, S. & Thelwall, M. (2002). Library and information science schools in and USA: A Webometric perspective.

Journal of Education for Library

Canada

and Information Science

43(2), 110-125.

Cronin, B. (1984).

The citation process: the role and significance of citations in scientific communication

. London: Taylor Graham.

Cronin, B. (2001). Bibliometrics and Beyond: Some thoughts on Web-based citation analysis.

Journal of Information Science

, 27(1), 1-7.

Egghe, L. & Rousseau, R. (1990). Preface. In: Egghe, L. & Rousseau, R. (Eds.).

I nformetrics (89/90).

Amsterdam: Elsevier Science Publisher. p. V Fox, E. A. & Urs, S. R. (2002), Digital libraries. In: Cronin, B. (ed.),

Annual Review of Information Science and Technology 36

, Medford, NJ: Information Today Inc., pp.

503-589.

Gopinath, M.A. (1992). Shiyali Ramamrita Ranganathan: A Profile in Relation to Librametry. In: Ravichandra Rao, I.K. (Ed.).

Informetrics 91.

Bangalore: Sarada Ranganathan Endowment Library Science, 9-16 Hood, W.W. (2001). The literature of bibliometrics, scientometrics, and informetrics.

Scientometrics, 52(2), 291-314

Herring, S. C. (2002). Computer-Mediated Communication on the Internet. In: Cronin, B.

(ed.),

Annual Review of Information Science and Technology 36

, Medford, NJ: Information Today Inc., pp. 109-168.

Ingwersen, P. (1998). The calculation of Web Impact Factors.

Journal of Documentation

, 54(2), 236-243.

Kling, R. & McKim, G. (2000). Not Just a Matter of Time: Field Differences in the Shaping of Electronic Media in Supporting Scientific Communication.

Journal of the American Society for Information Science

, 51(14), 1306-1320.

Kretschmer, H. & Thelwall, M. (2003). The development of information professionals: European perspective: The way from librametry to webometrics. In: A.

The

Amudhavalli (Ed),

Proceedings of the MALA Platinum Jubilee Celebrations, National Seminar on Information Professionals for the Digital Era, Madras, India, January 29-30, 2003

, EFEX: Chennai, 2003, 13-25 Larson, R. (1996). Bibliometrics of the World Wide Web: An exploratory analysis of the intellectual structure of Cyberspace. In

Proceedings of ASIS96,

71-78. Retrieved July 1, 2002, from http://sherlock.berkeley.edu/asis96/asis96.html Li, X., Thelwall, M., Musgrove, P. & Wilkinson, D. (2003). The relationship between the l inks/Web Impact Factors of computer science departments in UK and their RAE ratings or research productivities in 2001.

Scientometrics

, 57(2), 239-255.

Moed, H. F. (2002) The impact-factors debate: the ISI’s uses and limits,

Nature

, 415, 731 732.

Ranganathan, S.R. (1995, Reprint from 1969). Librametry and its Scope.

The International Journal of Scientometrics and Informetrics,

1(1), 15-21.

Rodríguez Gairín, J. M. (1997). Valorando el impacto de la informacion en Internet: AltaVista, el "Citation Index" de la Red,

Revista Espanola de Documentacion Cientifica

, 20:175-181.

http://www.kronosdoc.com/publicacions/altavis.htm

Rousseau, R. (1997). Sitations: an exploratory study, http://www.cindoc.csic.es/cybermetrics/articles/v1i1p1.html

Available:

Cybermetrics

, 1. Available:

Sen, S.K. (1995). Introduction.

The International Journal of Scientometrics and Informetrics,

1(1), 3-5.

Small, H. (1999). Visualising science through citation mapping,

Journal of the American Society for Information Science,

50(9), 799-812.

Tang, R. & Thelwall, M. (2002). Disciplinary Differences in US Academic Departmental Web Site Interlinking, SUNY Albany.

Thelwall, M. Binns, R. Harries, G. Page-Kennedy, T. Price E., and Wilkinson, D. (2001). Custom Interfaces for advanced queries in search engines, Thelwall, M., Tang, R. & Price, E. (2003). Linguistic patterns of academic Web use in Western Europe,

Scientometrics ASLIB

, 56(3), 417-432.

links, bibliometric couplings and colinks.

Proceedings,

53(10), 413-422.

Thelwall, M. & Wilkinson, D. (2004, to appear). Finding similar academic Web sites with

Information Processing & Management

.

Thelwall, M. (2000). Web Impact Factors and Search Engine Coverage,

Documentation

, 56(2), 185-189.

Thelwall, M. (2001a). A Web crawler design for data mining,

Journal of Information

27(5) 319-325.

Thelwall, M. (2001b).

Extracting macroscopic information from web links,

Journal of Science Journal of the American Society for Information Science and Technology

, 52 (13), 1157-1168.

Thomas, O. & Willett, P. (2000). Webometric analysis of departments of Librarianship information science.

Journal of Information Science,

26(6), 421-428.

and

Vaughan, L. & Thelwall, M. (2003). Scholarly use of the Web: What are the key inducers of links to journal Web sites?

Journal of the American Society for Information Science and Technology

, 54(1), 29-38.

Vohora, S. B., Shah, Z. A. & Vohora, D. (2001). Low impact factors for Indian journals. Do factors other than quality influence?

Current Science

81(8), 867.

White, H. D. & McCain, K. W. (1998). Visualizing a discipline: an author co-citation analysis of information science, 1972-1995,

Journal of the American Society for Information Science

, 49(4), 327-355.

Wilkinson, D., Harries, G., Thelwall, M. & Price, E. (2003). Motivations for academic site interlinking: Evidence for the Web as a novel source of information on scholarly communication,

Journal of Information Science

, 29(1), 59-66.

informal Web