Reconciling OCLC and Orbis - Liberty Application Server

Transcript Reconciling OCLC and Orbis - Liberty Application Server

Reconciling OCLC
and Orbis
Managing a Bibliographic and
Holdings Synchronization
Between Yale University Library
and WorldCat
Melissa A. Wisner
Purpose of Presentation

Describing the process





What is involved?
Staffing required
Timeframe
Programming required
Are We Done Yet?

No!
Why do you want to come to this
talk?

For any size collection a reconciliation is a detail
oriented project, planning, pre-processing,
OCLC processing, dealing with returned data,
maintaining the data

Why do this?

Living with your own standards—good or bad

What is your database of record?
YUL Background

Voyager ILS since 2002

Approximately 8.5 million bibliographic records

Member of (former) RLIN

OCLC Participant—add pcc records, create IR
records in WorldCat, weekly holdings update,
some cataloging directly in Connexion, ILL
lender

Early 00’s YUL did retrospective conversion with
OCLC
Standard Workflow between Voyager
and OCLC


Weekly export to OCLC (staff flag records to
send as needed)
Sporadic OCLC Batch Matches over the
years

Local program to identify “candidate” records by
encoding level and “UNCAT” status; send out to OCLC
as separate project; filter and reload any 1, 4, or 7 el
returned records and overlay the original

Run LC Match once a month-similar process against
local copy of LCDB
Arcadia Grant

Cultural Knowledge grant

March 2009-March 2013

$5 million/$1 million per year

Cambodian Newspapers, Khmer Rouge
Genocide documentation, African language
materials and more…

Layoffs and re-staffing
What Records to Send or
Exclude?

Divided up by locations for staff review

Uncovered some data problems we knew about and
didn’t know about…e.g. locations with no holdings in
them; locations that still had holdings we thought
had been migrated to new locations

Most significant…outdated MARC tags, outdated
format codes, practice different from OCLC, dual
script records
What Records to Send or
Exclude?

Sending approximately 6.7 million out of 8.5 million
bibliographic records as UTF-8

Excluding:
 MARCIVE
 E-resource records
 Suppressed bibs
 Unsuppressed bibs with suppressed holdings
records
 In Process/On Order records
 UNCAT records**
Tracking our Records

MySQL database created

Bib IDs

Exlcude Project ID (local tracking)

OCLC Project IDs

Reload Dates
Tracking our Records

Used this to QA the results of the queries run
to identify all potential records

Used this to push out files of bib ids by OCLC
project ID to be used later to extract correct
records to send to OCLC

Tracking was/is a big effort of reconciliation!
Tracking our Records

As records are prepared for loading back into
Voyager this MySQL database will be
updated with those date(s)

OCLC will produce crossref reports and other
processing reports per each file, but these
are not concatenated into any form of a
relational database
Building an 079 Index in Voyager

Ex Libris contracted to generate and update
Voyager indexes

Created in both Production and Test
environments--took less then a day each
time; downtime required; $ for service

Added 079|a and 079|z left anchored indexes
Building an 079 Index in Voyager

Updated SYSN Composite Index to include new 079
indexes:



019|a
035|a |z
079|a |z

Indexes were mostly to assist staff in searching, but
also for bulk import profiles for ongoing loads

Exploring how to use the new indexes in ongoing
EOD or e-resource loads from vendors
OCLC Pre-Processing

OCLC IBM Mainframe limitations

Sending records in 100MB limit/90,000 records
per file AND only 15 files per day

Separating records with 880s from those without

Additionally, OCLC is splitting out PCC records
from the YUS files
OCLC Pre-Processing

Each set of files sent as a “project” with
unique ID

Creating label files, tracking via spreadsheet

Suspended weekly exports to OCLC
(9/5/2010-12/20/2010**)
OCLC Pre-Processing

Deleting YUL IR records in WorldCat


Why? Easier matching?
5.7 million removed total

EBScan software process

Match routines set:

Example: match on this field and that or ….
Cross Ref Reports and Stats

Sample

Adding in prefixes of ocm and ocn

Other statistical reports
Loading OCLC Numbers back
into Orbis

Basic Process:






Retrieve crossref report to be used as input
Script to de-dupe crossref reports by name*
Extract MARC record using Voyager API
BIB_RETRIEVE and crossref as input
MARC4Java Open Source to parse and update the
MARC record*
Remove any version of *OCoLC* *ocm* *ocn* in 035|a
Insert new IR number from crossref report with a prefix
of (OCoLC)
Loading OCLC Numbers back
into Orbis

Basic Process:





Comparing 079|a to crossref report—if same, move on,
if new, just add and move on, if different, update with
new one and report out old one
Remove any 079|z and report out
Prepare new file of MARC records for bulk import
Report out log summary of process, errors
encountered, discrepancies in 079|a
See handouts!
Loading OCLC Numbers back
into Orbis

Will also be our new permanent workflow
post-reconciliation—maintenance of these
control numbers!

Cornell, Columbia and Stanford all used
similar processes…

Original hope was to load 250,000 records
per day 4 days a week=estimated 6 weeks to
reload everything back into Orbis…
Loading OCLC Numbers back
into Orbis

All depends on timing…OCLC process 80K records
in 1-2 days for 6.7 million bibs it is 1.2 million/month
or 2.4/month or 3 to 6 months total to process our
data!

We can keep pace with loading updated MARC
data, but waiting 6 months is a big deal

Need to keep 1 day a week for all other load activity
in Orbis
Loading OCLC Numbers back
into Orbis

Run a keyword regen once a week—even
though keyword index not being updated

Program to extract and update MARC records
can process 80K records in 15 minutes

Bulk import run no-key takes 2 hours to load 80K
records

Minimize the loss of any staff changes
Handling Errors

Reports from OCLC with no match records
(validation errors)

Correcting anything in OCLC?

Correcting records in Voyager then resubmitting post-reclamation?

See handouts!
Processing a “Gap” File

Suspended weekly exports to OCLC 9/5/2010

Extracted a version of the bib record between
9/8/10 and 9/10/2010

Identify and extract all changes and new records
from 9/8/10, that have an 079|a and the last
operator in History is not OCLCRECON

Send to OCLC as another one-off project
What Staff Will Do During
Reconciliation

No processing of holdings in OCLC

ILL OK

Will not create IR records so as not to affect
matching

Work in Orbis as normal otherwise
Modifications Needed to Resume
Weekly Exports to OCLC

Two file streams needed-one for archival
materials and one for everything else

PCC records will be split off once at OCLC

YUM records split off once at OCLC

New process/program created
Lessons Learned So Far

Consistent application of standards across
cataloging units (Suppressed, Suppressed!!!, In
Process records, etc.)

What is your database of record?

How much time to spend on fixing records so they
can be sent?

Maintenance of the control numbers long term
Questions?

Thank you!

[email protected]