The Future of the Online Catalog

Download Report

Transcript The Future of the Online Catalog

Library Automation: The Future of the Online Catalog

Andrew K. Pace NCSU Libraries July 28, 2006

What I will cover:

 Online catalog: the problem  Brief environmental scan  Endeca: team, timeline, technology  Usability, statistical results, relevance study  Dis-integrated systems / Future Catalogs

What ILS Catalogs Do Well…

(liberally stolen from Roy Tennant)  Inventory control: What and where  Known item searching

What ILS Catalogs Don’t do Well…

(liberally stolen from Roy Tennant, and augmented by me)  Any search other than known item  Most Anything other than books (serials, e-resources, articles, digital objects)  Logical groupings of results (e.g. FRBR)  Faceted browsing  Relevance ranking  Sideways searching (suggestions, expansion of searches and search targets)

“OPAC Complainers”

“There is certainly no dearth of OPAC complainers. You have Andrew Pace (OPACs suck), and Roy Tennant (You Can’t Put Lipstick on a Pig) writing and presenting about the need for change (more simplicity) in the OPAC world. I can appreciate their arguments for a simpler OPAC (not to mention the rest of the system) but other then [ sic ] present their arguments, neither has much in the way of suggestions nor have they sparked a movement among librarians or the automation vendors to do anything about the situation.” -ACRL Blog entry Oct. 13 2005

NextGen Library Search Tools

       RedLightGreen (RLG) OCLC Fictionfinder Vivisimo clustered search (Ex Libris, Serials Soltions) Grokker (EBSCO) Aquabrowser visual context Endeca Information Access Platform OCLC Custom Worldcat and OpenWorldCat       Innovative Interfaces OPAC Pro & Encore Ex Libris Primo Polaris, AJAX-Enabled OPAC SirsiDynix Enterprise Portal System, FAST Talis, et al  Web Services Georgia Pines and the Library 2.0 Bandwagon

Endeca purchase decision

 Lots of topical searches and poor subject access – – Keyword gives too many or too few results – leads to general distrust Misunderstanding of authority headings  No relevancy ranking of results  Needed more responsiveness (speed)

Implementation Team

  7 representative team members – Andrew Pace, IT, Chair – – – Emily Lynema, IT, ex officio (tech lead) Cindy Levine, Research and Information Services Erik Moore, IT, ex officio (ILS librarian) Charley Pennell, Metadata and Cataloging – – – Shirley Rodgers, IT Tito Sierra, Digital Library Initiatives Timeline – License / negotiation: Spring 2005 – – Acquire: Summer 2005 Implementation: August 2005 – January 12, 2006

Technical Overview

 Endeca ProFind co-exists with SirsiDynix Unicorn ILS and Web2 online catalog.

 Endeca indexes MARC records exported from Unicorn.

 Index is refreshed nightly with records added/updated during previous day.

Endeca ProFind Overview

Endeca ProFind

NCSU exports and reformats Data Foundry Navigation Engine Raw MARC data Parse text files Indices Flat text files HTTP HTTP NCSU Web Application Client browser

Endeca ProFind Overview

Offline - Nightly

NCSU exports and reformats Data Foundry Raw MARC data Parse text files Indices Flat text files HTTP Client browser Navigation Engine HTTP NCSU Web Application

Endeca ProFind Overview

NCSU exports and reformats Raw MARC data Flat text files Data Foundry Parse text files Indices

Always Online

Navigation Engine HTTP HTTP NCSU Web Application Client browser

Integrating Endeca

    Endeca doesn’t understand MARC data / MARC-8 character encoding – translate to UTF-8 text files Each night a script updates the data indexed by Endeca: – – – Exports updated or new MARC records from Unicorn.

Reformats and merges these records with those already indexed.

Starts Endeca re-index – completely rebuilding index for the catalog.

Process requires about 4 hours.

Retain Web2 OPAC for some functionality – Authority searching - known items and cross-references – Detailed record pages – how to make Endeca -> Web2 link?

Quick Demo

 http://catalog.lib.ncsu.edu

Some User Reaction

“This is absolutely the coolest thing I've seen all century.” Will Owen, Head of Systems (UNC Libraries) “Also, I'm really digging the new NCSU library catalog. Very nice." - Educause staff (non-librarian) “The new Endeca system is incredible. It would be difficult to exaggerate how much better it is than our old online card catalog (and therefore that of most other universities). I've found myself searching the catalog just for fun, whereas before it was a chore to find what I needed.” - NCSU Undergrad, Statistics

Basic statistics

(March – May 2006)

Search -> Navigation 29% Requests by Search Type Search 51% Navigation 20%

Navigation statistics

(March – May 2006) 23,848

Navigation Requests by Dimension Availability LC Classification Subject: Topic Subject: Genre Format Library Subject: Region Subject: Era Language Author

0 65,545 74,985 87,221 59,248 30,000 38,605 38,074 60,000 70,516 90,000

Requests

120,000 150,000 155,856 169,249

Navigation statistics

(March – May 2006)

Navigation by Dimensions Language 5% Subject: Era 5% New 4% Subject: Region 7% Availability 3% LC Classification 20% Subject: Genre 8% Author 9% Format 9% Library 11% Subject: Topic 19%

Sorting statistics

(March – May 2006)

Sorting Requests Author A-Z 9% Call Number 6% Title A-Z 13% Pub Date 53% Most Popular 19%

Other interesting tidbits…

(March 2006)    Authority searching decreased 45% Keyword searching increased 230% – Caveat: default catalog search changed from title authority to keyword ~ 5% of keyword searches offered spelling correction or suggestion – – 3.1% - automatic spell correction 2.3% - “Did you mean…” suggestion

Usability Testing Trends

   10 undergraduate students – – 5 with Endeca catalog 5 with old Web2 OPAC Endeca performed as well as OPAC for known-item searching – 89% Endeca tasks completed ‘easily’ (8/9) – 71% OPAC tasks completed ‘easily’ (15/21) Endeca performs better than OPAC for topical searching – – – – 61% Endeca tasks completed ‘easily’ (19/31) 3% Endeca tasks completed as ‘hard’ (1/31) 33% OPAC tasks completed ‘easily’ (13/39) 26% OPAC tasks completed as ‘hard’ (10/39)

A study in relevance

 Are search results in Endeca more likely to be relevant to a user’s query than search results in Web2 OPAC?  100 topical user searches from 1 month in fall 2005  How many of top 5 results relevant?

– – 40% relevant in Web2 OPAC 68% relevant in Endeca catalog

  

Relevance defined

Relevance ranking in Endeca – select from a variety of modules and order them based on importance.

Relevance most important in Keyword Anywhere - searches all fields.

At NCSU… 1. Original query term(s) (no thesaurus, stemming, spell correction) 2. Exact phrase match 3. Field ranking (Title higher than Author higher than Table of Contents) 4. Number of fields that contain term(s) …

 

Future Plans

Ongoing tweaks: – Continued usability testing – – – Relevance ranking algorithms & spell correction thresholds Additional browsing options Endeca 2.0 ideas – FRBR-ized display Discussions with OCLC regarding FAST (Faceted Access to Subject Terms) and FRBR – – – – Patron-generated refinements (folksonomies?) Enrich records with supplemental Web Services content – more usable TOCs, book reviews, etc.

The death of authority searching (?) More integration with QuickSearch, other data repositories, and third-party discovery tools

Stuff to read…

       Rethinking how we provide bibliographic services for the University of California by the Bibliographic Services Task Force http://libraries.universityofcalifornia.edu/sopag/BSTF/Final.pdf

The Changing nature of the catalog and its integration with other discovery tools by Karen Calhoun http://www.loc.gov/catdir/calhoun-report-final.pdf The Changing nature of the catalog and its integration with other discovery tools. Final report. March 17, 2006. Prepared for the Library of Congress by Karen Calhoun: A Critical review by Thomas Mann http://www.guild2910.org/AFSCMECalhounReviewREV.pdf

A “Next Generation Catalog, Eric Morgan http://dewey.library.nd.edu/morgan/ngc/ Metadata Research Center, SILS http://ils.unc.edu/mrc/ University of Rochester eXtensible Catalog Toward a 21 and Pace st Century Catalog, ITAL, Sept. 2006, by Antelman, Lynema,

From the Calhoun Report

 "If one accepts the premise that library collections have value, then library leaders must move swiftly to establish the catalog within the framework of online information discovery systems of all kinds. Because it is catalog data that has made collections accessible over time, to fail to define a strategic future for library catalogs places in jeopardy the legacy of the world's library collections themselves. For this reason, the option of rejecting library catalogs is not considered in this report."

The library system pile

 “Seams serve as perceptible boundaries that provide points of reference; without such boundaries readers get ‘lost at sea’ and don’t know were they are in relation to anything else; they can’t perceive either the extent of what they have or what they don’t have.” -Thomas Mann

Wither or Whither the Catalog?

Reversal of fortune

OLD SEARCH MODEL NEW SEARCH MODEL

The library system puzzle

Serials A&I / FT DBs Catalog Web

The library system puzzle

Catalog Serials Metasearch ERM Systems Guided Navigation Legacy ILS GS Digital Repositories Web IR A&I / FT DBs

Thank you.

http://www.lib.ncsu.edu/endeca Andrew Pace, Head, IT [email protected]