A New Kind of Catalog

Download Report

Transcript A New Kind of Catalog

A New Kind of Catalog
Charley Pennell
Principal Cataloger for Metadata
North Carolina State University
North Carolina Library Association 2007
Where is this talk headed?







Local motivation
National trends
What is Endeca?
Features
Does Endeca work?
Where are we going from here?
Where is everybody else going?
Why a new catalog?
What was wrong with the old one?
A little TRLN catalog primer





TRLN libraries (Duke, NCCU, NCSU, UNCCH) jointly develop and maintain BIS, 19851992
DRA implemented for catalog (UNC & Duke
continue Acq/Serials modules), 1991-1993
No integrated keyword/browse
capability, 1993-1999
Web2 catalog implemented, 1999Sirsi & DRA “merge” in 2002; Taos DOA
A little TRLN catalog primer 2



NCSU & NCCU to Unicorn; Duke to Aleph;
UNC-CH to Millenium, 2003-2004
Sirsi/Dynix merger, 2004: vendor focus shifts
(even more) toward school/public market
While agreeing to continue to support Web2,
S/D increasingly looking to merge all product
catalogs into single interface
What was the catalog lacking?







Simplicity: a simple, hopefully uncluttered interface
Interactivity: ways to interact with results to get
better results
Forgiveness: just fix my typos and case errors,
don’t make me feel stupid!
Response time: always
Real-time sorting: the limit is how many?!!
Relevance ranking: as if!
Web services: use the Web to repurpose data,
enable mash-ups, add-ons & improvements
Which interface is ready for
immediate use?
90
80
70
60
50
East
West
North
40
30
20
10
0
1st 2nd 3rd 4th
Qtr Qtr Qtr Qtr
So, why DOES everyone think that
the catalog sucks stinks?
"Most integrated library systems, as they are
currently configured and used, should be
removed from public view."
- Roy Tennant, OCLC
The old model
The integrated library system




Historically, the ILS developed as an inventory
control system for use by library staff only
First library automation systems (Plessey, CLSI,
Geac, Innovative) were designed around circulation
or acquisitions functions
Interaction time was calibrated to the slow pace of
backroom work where the audience was basically
captive
Staff focus on known-item searching, not resource
discovery
The catalog as part of the ILS




The first integrated OPACs were veneers on top of
existing inventory management systems—patrons &
staff competed for system resources! They still do!
First OPACs allowed for browse only; early keyword
searching restricted to certain fields (A/T/S) only
Libraries with no IT support were stuck with what
their vendor provided and the enhancement process
for improvements
Libraries with IT support created their own systems:
BIS, NOTIS, Clarement Colleges, Georgetown,
PALS, DOBIS/LIBIS
The state of the ILS in 2007




Customer demands for increasing
functionality in a marketplace with
little $$ to spend has reduced the
ILS vendor pool through mergers
and buyouts
New functionality (multi-search,
ERMS, E-Ref, ILL, etc.) increasingly
being met by stand-alone and third party applications
Increasing competition from open source (Koha,
Evergreen, Scriblio, LibraryThing) and e-commerce
Q: Is our dogged adherence to MARC the only thing
keeping the remaining ILS vendors afloat?
The state of the catalog 2007



Library users’ search expectations have been
conditioned by interactions with commercial
Websites and Google, with which Libraries can
barely afford to compete, but must
Libraries are becoming increasingly
virtual as users interact with us
online (e-resources, Second Life)
User expectations for online
experiences are more interactive,
instantaneous, and inviting
Perhaps most importantly…

The information
resources
represented in the
catalog represent a
shrinking percentage
of what end users
need or want
Calhoun’s Aristotelian vs. Copernican
views of the catalog
What do users want from the OPAC?



Make subject searching in online catalogs easier using postBoolean probabilistic searching with automatic spelling
correction, term weighting, intelligent stemming, relevance
feedback, and output ranking
Streamline users' book selection decisions at the catalog by
adding tables of contents and back-of-the-book indexes to
cataloging (i.e., metadata) records
Reduce the many failed subject searches by expanding the
online catalog with full texts—journal and newspaper articles,
encyclopedias, dissertations, government documents, etc.
Increase finding strategies in online catalogs through the library
classification
-- Markey, Karen (2007). “The online library catalog: Paradise
lost and paradise regained”, D-Lib Magazine, 13(1/2).

“Many researchers express surprise at the brevity
(from one to three words) of the queries people
submit to online systems. Belkin tells why so few
words make up their queries, "Precisely because of
the inquirer's lack of knowledge about a problem
area, it is impossible to specify what would resolve
it." For Belkin, the saving grace is the inquirer's
ability to recognize what he or she wants or does
not want during the course of the search. Therein
lies an important solution to the problem—
information systems that report results for easy
eyeballing and instantaneous recognition of relevant
possibilities.” – Karen Markey
What is an Endeca?



A software company based in
Cambridge, MA
A search and information
access technology provider for
a number of major e-commerce
websites
Developers of the Endeca
Information Access Platform
Endeca features







Commercialstrength search/sort
speeds
Site customizable
relevance ranking
Faceted browse
True browsing (LC
classification)
Spell-checking
”Did you mean?”
Automatic word
stemming
Endeca at NCSU Libraries





Went live in January
2006
Works with a text version
of a daily snapshot of
Libraries’ MARC & other
metadata
Used to improve the
discovery portion of the
library catalog
Interoperates with ILS for
holdings, current
availability status
Web2 interface still
present for known item &
authority searching
Implementation timeline



License / negotiation: Spring 2005
Acquire: Summer 2005
Implementation:





August 2005 : vendor training
September 2005 : finalize requirements
October 2005 – January 2006 : design and
development
January 12, 2006 : go-live date
Widen to TRLN partners: Winter 2008
Implementation Team



Implementation Team brought together from IT, DLI,
Cataloging, Collections, Reference, Circulation
Worked on indexing, UI, usability testing, etc.
Areas of contention





Number of initial search boxes (1 or 2)
Order, grouping of facets
Placement of classification hierarchies, breadcrumbs
Use of “search” and “browse” on tabs
Visualization aided by Tito’s wireframes
Brief view vs. Full view
gives user choice about
displaying holdings.
Reduces complexity of
continuing and online
resources.
8th (and Final) Revision:
Aggregate holdings
information by library.
NCSU Endeca features
Breadcrumbs
Call #
browse
Results
Facets
Features we started with







Faceted browse
Availability facet
Breadcrumbs
Spell check / Did you mean
Hierarchical subject browse based on LCC
Fuzzy link to live Web2 data
New book browse for titles added in last
week only
Features that we’ve added





New book browse based on relative date (last
week, last month, last three months)
RSS feeds based on user results
“Search within” results
Send search to TRLN partners
Static unique link to live Web2 data
Relevance ranking
Based on locally customizable algorithm:



Most relevant: query exactly as entered
For multi-term searches: phrase match
Field match


title match more relevant than notes match
Other factors:



number of fields matched
weighted frequency
static ordering (publication date, circulation stats)
Faceting at the NCSU Libraries





Follows on what we have learned from the
commercial Web search model
Mines metadata already available via MARC record,
local class number, ILS item categories, circ status,
and date stamping
Required massive clean-up of 6xx subdivisions
Allows both pre- and post-coordinate limits
Uses table mapping to enable drilling down through
call number results
Facet refinements

Availability
Author
Library
Format
Language

New(ness)









LC Classification
Subject: Topic
Subject: Genre
Subject: Region
Subject: Era
A single facet need not represent
data from a single field




Single Unicorn item types
(Book, Kit, Manuscript,
Map, Data set)
Multiple Unicorn item types
(Audio, Microform,
Thesis/Dissertation,
Software & Multimedia,
Videos)
Leader byte 07 (Bib lvl):
Journal, Magazine
Library (Online)
Ranking facet results by number of postings makes
sense in a short list, but not in a long list
The author facet is less useful in some types
of searches …
… than others!
Technical overview
Information Access Platform
NCSU exports
and reformats
Data
Foundry
Parse text
files
Raw MARC
data
MDEX
Engine
Indices
Flat text
files
HTTP
HTTP
NCSU Web
Application
MARC ingest



MARC  flat text file(s) for ingest by Endeca.
Transformation accomplished with MARC4J.
Opportunity to manipulate data on the back-end.
Transformed data
The end result…
Video
Other Endeca library catalogs




Phoenix Public Library:
http://www.phoenixpubliclibrary.org/
McMaster University:
http://libcat.mcmaster.ca
Florida Center for Library Automation
http://catalog.fcla.edu/
Individual Florida universities
http://fs.catalog.fcla.edu/, etc.
Does Endeca work?
Problems:
authority control




Endeca is a keyword search engine; “browse” can
only be effected using sort options
There is no authority control within Endeca itself,
rather it relies on AC within ILS
To make use of available metadata, subjects were
split along subdivisions. Authors were not
Talks were held with the vendor to explain the
potential for drawing on authority x-refs to collocate
searches
Problems:
subject context


Problems with wrong delimiter values (esp. $v)
Problems maintaining context in atomized LCSH



One-way relationships
 English language$vDictionaries$xSpanish
Chronological headings devoid of geographic context
 Cuba$xHistory$yRevolution, 1959
Phrase headings expressed in multiple subdivisions
 Prisoners$xAbuse of
Problems:
subject hierarchies

Chronological hierarchy not built into $y





“19th century” does not subsume 1800-1809, 1801-1861, 1809-1817, 18151861, 1817-1825, Civil War, 1861-1865, etc.
Geological periods exist as text only (Ordovician, Pleistocene, etc.)
Some chronological headings are expressed as text in 650$a
 Middle Ages
 Nineteen sixties
Geographic hierarchy not consistent between 651 and 650
 $zNorth Carolina$zRaleigh
 $aRaleigh (N.C.)
BT/NT/RT relationships from authority file lacking
Some potential solutions





Search behavior education
FAST (Faceted Application of Subject
Terminology)
Web2 x-refs to redirect searches to Endeca
Combining $z hierarchies
Hierarchy lists
What do our users think?
“The new Endeca system is incredible. It would be difficult
to exaggerate how much better it is than our old online
card catalog (and therefore that of most other
universities). I've found myself searching the catalog
just for fun, whereas before it was a chore to find what I
needed.”
- NCSU Undergrad, Statistics
“The new library catalog search features are a big
improvement over the old system. Not only is the
search extremely fast, but seemingly it's much more
intelligent as well.”
- NCSU faculty, Psychology
Usability testing
Task Difficulty: New Catalog
Task Difficulty: Old Catalog
Failed
22%
Failed
23%
Easy
43%
Hard
7%
Easy
59%
Hard
22%
Medium
12%
Medium
12%
Usability testing
A verage Task D uration:
O ld vs New Catalog
00:00.0
Task 1
00:43.2
01:26.4
02:09.6
02:52.8
03:36.0
Old Catalog
New Catalog
Task 2
Task 3
Task 4
Task 5
Task 6
Task 7
Task 8
Task 9
Task 10
Usage statistics
Searches by Field Type: July 06 - Jan 07
420,000
360,000
300,000
240,000
180,000
120,000
60,000
0
Keyword
(default)
ISBN
Title
Author
Subject
Multi-Field
Newness
wearing off?

Requests by Search Type
Search ->
Navigation
29%
March ‘06 - May ‘06
Search 51%
Navigation
20%

July ‘06-January ‘07
Search and Navigation
Search ->
Navigation 25%
Navigation 8%
Search 67%
Navigation by Dimensions
July 06 – Jan 07
Subject: Topic
26%
Subject: Era
2%
Language
Availability
3%
2%
Subject: Region
4%
Author
6%
Subject: Genre
6%
Library
10%
LC Classification
21%
New
10%
Format
10%
July 06 – Jan 07
Navigation by Dimension (most used)
Subject: Topic
LC Classification
Format
New
Library
Subject: Genre
Author
Subject: Region
Language
Subject: Era
Availability
0
20,000
40,000
60,000
80,000
Requests
100,000
120,000
140,000
July 06 – Jan 07
Navigation by Dimension (order of UI presentation)
9,286
Availability
120,644
LC Classification
145,589
Subject: Topic
34,096
Subject: Genre
57,667
Format
54,476
Library
22,818
Subject: Region
12,257
Subject: Era
16,009
Language
32,650
Author
0
20,000
40,000
60,000
80,000
Requests
100,000
120,000
140,000
160,000
Where are we going from
here?
Future directions



Additional hierarchies (geographic names, dates)
Make use of NAF, SAF, particularly cross-reference
structure
Massage underlying metadata





Addition of Date Cataloged – Done!
Addition of LC Class numbers to e-resources – Done!
FRBR work numbers/records? – Tested!
FAST headings?
Accommodation of true browse for all indexes
Future opportunities


Expanding the scope of the implementation to the
10M records in TRLN (Duke, NCCU, NCSU, UNCChapel Hill)
Enrich catalog through external web services:



book jackets, reviews, TOC, etc. – Amazon, OCLC.
LibraryThing, Bowker Syndetics
Build use-case based cross-application shopping
cart functionality
Integrate catalog w/other tools through web
services—“Free the Data”
Web services…
Mobile device searching
Where is everybody else
going?

Catalogs detaching themselves from ILS





Detached data lends itself to experimentation
Don’t have to throw out baby with bathwater when
better interfaces come out
Data itself safe and secure in ILS
MARC becoming superfluous; MARC’s
granularity NOT!
Social interaction: reviews, folksonomic tags,
ratings
Phoenix Public Library on Endeca
III’s new faceted catalog, Encore
ExLibris Primo at Vanderbilt
Athens County, OH—Koha Zoom open source
Georgia PINES—Evergreen open source
Casey Bisson’s Scriblio
Danbury Public powered by LibraryThing
OCLC WorldCat Local at UW
Thanks for listening!
Charley Pennell
Principal Cataloger for Metadata
NCSU Libraries
North Carolina State University
Raleigh, NC 27695-7111
[email protected]
More info at: http://www.lib.ncsu.edu/endeca/