Transcript Document
What usage statistics say about online user behaviour Philip Davis, Librarian Cornell University Presented at the 5th Fiesole Collection Development Retreat, Oxford University, July 24, 2003. What do these stats mean? A. Monthly Statistics by Journal Jan Journal 2002 Subscribed Journal Usage Feb Mar Apr May Jun 2002 2002 2002 2002 2002 Total Use A 100 131 136 193 80 56 696 B 0 0 0 0 10 0 10 C 8 6 39 5 88 368 514 Usage statistics do not tell us… what is being downloaded who did the downloading why an article was downloaded how many individuals are responsible for the statistics Why we can’t know everything Patron confidentiality Use IP address as a surrogate for “user” Some IPs represent aggregate users Library proxy server Public computers in libraries and labs Dial-in modem users Some IPs are assigned dynamically Results of two studies (ACS) Article downloads by IP address Previous studies have reported only aggregate use analysis How scientists find the articles they read Using referral URL data. The location from which users were referred to the ACS site. Most users download few articles 300 200 100 0 1 3 2 5 4 7 6 9 8 11 10 13 12 15 14 Number of Downloads 17 16 19 18 20 …from few journals 800 600 400 200 0 1 3 2 5 4 7 6 9 8 11 10 13 12 15 14 16 17 19 18 21 20 Number of Journals 23 22 25 24 26 27 29 28 30 The relationship is quadratic 1200 Library Proxy Server 1000 Each point represents a “user” 800 600 400 200 N = 1283 0 Rsq = 0.6798 0 10 20 Number of Journals 30 In fact, its an inverse square law 40 x/y2 Library Proxy Server 30 20 10 N = 1283 0 Rsq = 0.6858 0 10 20 Number of Journals 30 Population size may be estimated 10000 5000 4000 3000 2000 Each point represents a journal 1000 500 400 300 200 100 50 40 30 20 10 Rsq = 0.9169 00 20 00 10 0 500 400 30 0 20 0 10 50 40 30 20 10 5 4 3 2 1 Number of Users Analysis of individual use illustrates Most users download few articles from few journals A small number of users have a very large effect on total downloads User population size may be estimated by total use Different paths to same destination Referral Type library catalog bib database e-journal list web page web search email (web based) article link other Total Referrals Total Referrals 2,482 2,372 1,813 1,108 996 592 571 15 9,949 Unique Referrals IPs per IP 552 324 405 190 491 79 204 9 1591 4.5 7.3 4.5 5.8 2 7.5 2.8 1.7 6.3 Web page referrals Web Page Referral ACS Journal Web Page News Department/lab Faculty Course web page Commercial Organization Personal Other Total Frequency Percent 366 272 200 75 43 31 21 19 81 1108 33 25 18 7 4 3 2 2 7 100 Most users referred infrequently 800 Frequency 600 400 200 0 1 3 2 5 4 7 6 9 8 11 10 13 12 15 14 Number of Referrals per IP 17 16 19 18 20 …from few sources 1200 1000 Frequency 800 600 400 200 0 1 2 3 4 5 6 7 8 9 Number of Unique Domain Referrals per IP 10 Total Referrals Yielding same inverse square law 500 400 Each point represents a “user” library proxy server 300 200 100 N = 1591 0 Rsq = 0.4107 0 10 20 Number of Domains per IP 30 In summary Scientists will use many different pathways to the same literature But use few and consistent methods of referral Underestimated the use of e-mail and bookmarking as a source of referral Underestimated bibliographic indexes Overestimated importance of library catalog Implications Libraries Develop redundant tools to facilitate access to literature Publishers Facilitate direct linking to article Adoptions of linking standards “Save the time of the reader” -- S.R. Ranganathan, from the Five Laws of Library Science P. Davis and L. Solla. An IP-level analysis of usage statistics for electronic journals in chemistry: Making inferences about user-behavior. JASIST 54(11), 2003 in press. P. Davis. Information seeking behavior of scientists: a transaction log analysis of referral URLs. (in review, JASIST, June 19, 2003). http://people.cornell.edu/pages/pmd8/