Crunching Numbers: OPAC Log Analysis of WebVoyage

Download Report

Transcript Crunching Numbers: OPAC Log Analysis of WebVoyage

2007 SCVUGM, Stillwater, OK
Crunching Numbers:
OPAC Log Analysis of WebVoyage
Bennett Claire Ponsford
Digital Services Librarian
Texas A&M University Libraries
Overview






Why analyze your log files
How to do it
What we found
The changes we made
What the latest logs say
What next?
Why Analyze?




To see how your users search when
you’re not watching
To resolve internal disagreements over
default searches, limits, etc.
To see whether changes to WebVoyage
really improved search results
As a counterpoint to task-based user
testing






C.S. Lewis
Lewis, C. S. (Clive Staples)
LION WITCH?
LION, WITCH?
LION, WITCH, AND WARDROBE?
Lewis, C. S. (Clive Staples)
Issues to Think About

Does Voyager capture the data you
need?


Does your network organize data the
way you need?


Privacy concerns
Staff vs. public IP addresses
Do you want all searches or a sample?
How To

Read the documentation



Begin logging your data
Extract data into Access database



Technical Manual, Chapter 15, Popacjob
Clean up data as needed
Run queries
Scratch head and contact Tech Support
Data Fields
Search_date
Stat_string
Session_id
Search_type
Search_string
27 Sep 2007
WebOpac
20070924111841
Title keyword
(TKEY story) AND (TKEY of)
AND (TKEY english)
Data Fields (cont.)
Limit_flag
Limit_string
Index_type
Y
Relevance
Y
Hyperlink
N
Hits
8
MEDI=v
K
Data Fields (cont.)
Search_tab
Client_type
Client_ip
Dbkey
Redirect Flag
1
W
###.###.##.###
AMDB20020820112825
N
SQL for Count of Search Type





SELECT Fall_2007_OPAC_log.Search_type,
Count(Fall_2007_OPAC_log.Search_type) AS
CountOfSearch_type1
FROM Fall_2007_OPAC_log
WHERE
(((Fall_2007_OPAC_log.Hyperlink)="N") AND
((Fall_2007_OPAC_log.Search_tab)="1") AND
((Fall_2007_OPAC_log.Client_type)="W"))
GROUP BY Fall_2007_OPAC_log.Search_type
ORDER BY Fall_2007_OPAC_log.Search_type;
Results
Search_type
Count
Search_type
Author Browse
415
Keyword (Sorted by releva
Author headings browse
826
Keyword Search
Author keyword
Builder
Call Number Browse
Command
Documents Call Number bro
Expert keyword (reverse c
Journal title keyword
Journal title starts with
Keyword
1612
1
1128
8
53
2762
575
1396
78
LC Call Number browse
Locally Assigned Call Num
Count
10099
1154
312
27
Simple Search
128
Subject Browse
910
Subject headings browse
278
Subject Headings keyword
428
Title keyword
Title Redirect Keyword
Title starts with
2772
415
4241
June 2006 (Voyager 5)
September 2006 (Voyager 5)


Changed interface
Defaults



Kept Tab at Simple Search
Changed Search to Keyword (CMD* with
javascript)
Changed result sort to by relevance
Fall 2006

Preparing to upgrade to Voyager 6.1


Some people unhappy with recent
changes



New keyword searches with ^ to
automatically AND words together
Default search
Search results sort order
Decided to look at the data
Decisions upgrading to V6

Basic data






Where are our searchers
What search tab are they using
How are they searching
Default search
Order of title searches
Simple limits
Where Are Our Searchers?
65%
35%
Inside
Libraries
Outside
Libraries
What Search Tab Used?
100%
80%
60%
40%
20%
0%
Simple (1)
Inside Libraries
Keyword (aka
Builder - 2)
Course Reserves
(3)
Outside Libraries
Default Search: Discussion

Title search (TALL)



What we traditionally had used
Reference’s preference
General keyword search (new GKEY^*)


What users are used to in a Google world
More forgiving search
or
d
or
d
(F
T*
)
Inside Libraries
Jo
u
Outside Libraries
rn
al
Ti
tl
e
Ti
tle
(C
A
M
ut
D
)
ho
rk
ey
w
Su
or
bj
d
ec
tk
ey
w
or
d
K
ey
w
or
d
*)
ld
er
)
D
(C
M
(B
ui
or
d
K
ey
w
K
ey
w
K
ey
w
Comparison of Searches Used
8000
6000
4000
2000
0
Comparison of Hit Rates
50%
40%
30%
20%
10%
0%
Hits
0
1-10
Title
11-50
Keyword
51+
Default Search: Decision

General keyword search (new GKEY^*)


User preference
Fewer No Hit results
First Title Search: Discussion

Left anchored title (TALL)


Preferred by Reference
Title keyword (new TKEY^*)

More forgiving
Title Search (TALL): Problems
Initial article
Keywords
Misspelled
Abbreviations
Wrong search
Left out all
articles
0
100
200
300
Title Search: Decision

Title keyword

Left-anchored title had too many problems
Simple Limits


Several additional location limits
requested
Concern that too many would be
confusing
Search Limits Used
10%
8%
6%
4%
2%
0%
u
A
o
di
l.
l
o
C
r
le
l
Se
s
D
/C
t
es
B
o
B
s
k
o
M
i
d
e
a
es
c
vi
r
Se
ls
ir a
Se
U
M
TA
Q
ar
t
a
.
s
b.
D
i
L
V
D
s
/
o
pu
e
d
am
Vi
tC
s
e
W
b
Li
ta
o
T
l
Simple Limits: Decision

Added new limits and will evaluate with
more data
Analysis of Voyager 6 Logs

Improved differentiation library staff
and public IP addresses
Where are our users?
80%
70%
60%
50%
40%
30%
20%
10%
0%
Fall 2007
Library staff
Public computers
Outside library
All public
Keyword and Subject Searches
40%
30%
20%
10%
0%
Keyword
Expert
keyword
Subject
browse
Library staff
Subject Subject
keyword redirect
All Public
Author Searches
8%
6%
4%
2%
0%
Author headings
browse
Author keyword
Library staff
Author redirect
All Public
Title Searches
40%
30%
20%
10%
0%
Journal Journal
keyword
title
Title
Title
keyword redirect
Library Staff
Title
starts
with
All Public
Location Limits Used
Fall 2007
800
700
Ed. Media Services
DVDs/Videos
Books
Qatar Library
600
Last 5 Years
500
Journals
400
Curriculum Coll.
300
Cushing Library
200
100
0
English
Best Seller Coll.
WWW Resources
West Campus Library
B
oo
ks
C
ur
r.
C
C
ol
us
l.
hi
ng
Ed
Li
.M
b.
ed
ia
Se
r.
En
gl
is
h
Jo
ur
na
La
ls
st
5
ye
ar
s
Q
at
ar
Li
D
VD
b.
s/
Vi
de
os
Comparison of Limits
50%
40%
30%
20%
10%
0%
Library Staff
All Public
Have Changes Helped?


Search frequency
No hits percentage
Search Frequency
60%
40%
20%
Sum. 2006
Fall 2006
tle
Ti
bj
ec
t
Su
A
ut
ho
r
C
al
lN
um
be
r
Jo
ur
na
lT
itl
e
K
ey
w
or
d
0%
Spr. 2007
Fall 2007
No Hit Percentages
30%
25%
20%
15%
10%
Sum. 2006
Fall 2006
Spr. 2007
Fall 2007
Summer 2006
na
l
d
d
Y^
*)
tit
le
or
d
or
Ti
tle
Fall 2006
(C
Su
M
bj
D)
ec
tk
ey
wo
rd
Ti
tle
ke
yw
or
d
Ke
yw
or
(G
KE
Jo
ur
d
yw
or
Ke
ey
w
ke
yw
rk
ho
ur
na
l
Jo
Au
t
Detailed No Hit Percentages
20%
10%
0%
Spring 2007
Fall 2007
Analysis of No Hit Searches



Do we have the title?
Why did the search not find it?
What can we do to help?
No Hit Title Searches:
Do We Own Them?
50%
40%
30%
20%
10%
0%
Yes
Keyword
No
Unable to
verify
Other
Left-Anchored (preliminary)
Keyword
S
Sp L
el
lin
St
g
em
m
in
g
Su
bt
itl
e
Sy
nt
V o ax
l#
/d
W
at
ro
e
ng
fie
ld
O
th
er
M
A
bb
re
A
vi
rt
ic
at
le
io
/E
n
R
IC
/e
C
tc
on
.
fe
re
In
nc
iti
e
al
ar
tic
le
Li
m
its
No Hit Title Searches: Problems
60%
40%
20%
0%
Left-Anchored (preliminary)
What Next?



Continued analysis of searches with no
hits
Analysis of search repair strategies
Word counts
Improvements: Spelling


Spellchecking
Automatic searching of variant spellings






“&” or “and”
British vs. American spellings
Numbers
Abbreviations
Did you mean? Suggestions based on field
Working on using ASPELL to create
spellchecker
Improvements: Help

More granular no hits help





Specific search types
Any search with “conference” or
“proceedings” in it
Journal title searches including “vol.”, “no.”,
or a number
Searches with more than 4 or 5 words
More granular help for too many hits
Improvements: Specific Searches

Keyword searches




Automatic stemming
Ignore punctuation and spacing
Ignore stop words
Title searches

Ignore initial article
More Information


Jansen, Bernard J. “Search log analysis: What
it is, what’s been done, how to do it,” Library
& Information Science Research, 28 (2006)
407-432.
Yu, Holly and Margo Young, “The impact of
Web search engines on subject searching in
OPAC”, Information Technology and Libraries,
23 (2004) 168-180.
Contact Information

Bennett Claire Ponsford




[email protected]
979/845-0877
https://libcat.tamu.edu (production)
http://surprise-am.tamu.edu (test)