Reconciling OCLC and Orbis - Liberty Application Server
Download
Report
Transcript Reconciling OCLC and Orbis - Liberty Application Server
Reconciling OCLC
and Orbis
Managing a Bibliographic and
Holdings Synchronization
Between Yale University Library
and WorldCat
Melissa A. Wisner
Purpose of Presentation
Describing the process
What is involved?
Staffing required
Timeframe
Programming required
Are We Done Yet?
No!
Why do you want to come to this
talk?
For any size collection a reconciliation is a detail
oriented project, planning, pre-processing,
OCLC processing, dealing with returned data,
maintaining the data
Why do this?
Living with your own standards—good or bad
What is your database of record?
YUL Background
Voyager ILS since 2002
Approximately 8.5 million bibliographic records
Member of (former) RLIN
OCLC Participant—add pcc records, create IR
records in WorldCat, weekly holdings update,
some cataloging directly in Connexion, ILL
lender
Early 00’s YUL did retrospective conversion with
OCLC
Standard Workflow between Voyager
and OCLC
Weekly export to OCLC (staff flag records to
send as needed)
Sporadic OCLC Batch Matches over the
years
Local program to identify “candidate” records by
encoding level and “UNCAT” status; send out to OCLC
as separate project; filter and reload any 1, 4, or 7 el
returned records and overlay the original
Run LC Match once a month-similar process against
local copy of LCDB
Arcadia Grant
Cultural Knowledge grant
March 2009-March 2013
$5 million/$1 million per year
Cambodian Newspapers, Khmer Rouge
Genocide documentation, African language
materials and more…
Layoffs and re-staffing
What Records to Send or
Exclude?
Divided up by locations for staff review
Uncovered some data problems we knew about and
didn’t know about…e.g. locations with no holdings in
them; locations that still had holdings we thought
had been migrated to new locations
Most significant…outdated MARC tags, outdated
format codes, practice different from OCLC, dual
script records
What Records to Send or
Exclude?
Sending approximately 6.7 million out of 8.5 million
bibliographic records as UTF-8
Excluding:
MARCIVE
E-resource records
Suppressed bibs
Unsuppressed bibs with suppressed holdings
records
In Process/On Order records
UNCAT records**
Tracking our Records
MySQL database created
Bib IDs
Exlcude Project ID (local tracking)
OCLC Project IDs
Reload Dates
Tracking our Records
Used this to QA the results of the queries run
to identify all potential records
Used this to push out files of bib ids by OCLC
project ID to be used later to extract correct
records to send to OCLC
Tracking was/is a big effort of reconciliation!
Tracking our Records
As records are prepared for loading back into
Voyager this MySQL database will be
updated with those date(s)
OCLC will produce crossref reports and other
processing reports per each file, but these
are not concatenated into any form of a
relational database
Building an 079 Index in Voyager
Ex Libris contracted to generate and update
Voyager indexes
Created in both Production and Test
environments--took less then a day each
time; downtime required; $ for service
Added 079|a and 079|z left anchored indexes
Building an 079 Index in Voyager
Updated SYSN Composite Index to include new 079
indexes:
019|a
035|a |z
079|a |z
Indexes were mostly to assist staff in searching, but
also for bulk import profiles for ongoing loads
Exploring how to use the new indexes in ongoing
EOD or e-resource loads from vendors
OCLC Pre-Processing
OCLC IBM Mainframe limitations
Sending records in 100MB limit/90,000 records
per file AND only 15 files per day
Separating records with 880s from those without
Additionally, OCLC is splitting out PCC records
from the YUS files
OCLC Pre-Processing
Each set of files sent as a “project” with
unique ID
Creating label files, tracking via spreadsheet
Suspended weekly exports to OCLC
(9/5/2010-12/20/2010**)
OCLC Pre-Processing
Deleting YUL IR records in WorldCat
Why? Easier matching?
5.7 million removed total
EBScan software process
Match routines set:
Example: match on this field and that or ….
Cross Ref Reports and Stats
Sample
Adding in prefixes of ocm and ocn
Other statistical reports
Loading OCLC Numbers back
into Orbis
Basic Process:
Retrieve crossref report to be used as input
Script to de-dupe crossref reports by name*
Extract MARC record using Voyager API
BIB_RETRIEVE and crossref as input
MARC4Java Open Source to parse and update the
MARC record*
Remove any version of *OCoLC* *ocm* *ocn* in 035|a
Insert new IR number from crossref report with a prefix
of (OCoLC)
Loading OCLC Numbers back
into Orbis
Basic Process:
Comparing 079|a to crossref report—if same, move on,
if new, just add and move on, if different, update with
new one and report out old one
Remove any 079|z and report out
Prepare new file of MARC records for bulk import
Report out log summary of process, errors
encountered, discrepancies in 079|a
See handouts!
Loading OCLC Numbers back
into Orbis
Will also be our new permanent workflow
post-reconciliation—maintenance of these
control numbers!
Cornell, Columbia and Stanford all used
similar processes…
Original hope was to load 250,000 records
per day 4 days a week=estimated 6 weeks to
reload everything back into Orbis…
Loading OCLC Numbers back
into Orbis
All depends on timing…OCLC process 80K records
in 1-2 days for 6.7 million bibs it is 1.2 million/month
or 2.4/month or 3 to 6 months total to process our
data!
We can keep pace with loading updated MARC
data, but waiting 6 months is a big deal
Need to keep 1 day a week for all other load activity
in Orbis
Loading OCLC Numbers back
into Orbis
Run a keyword regen once a week—even
though keyword index not being updated
Program to extract and update MARC records
can process 80K records in 15 minutes
Bulk import run no-key takes 2 hours to load 80K
records
Minimize the loss of any staff changes
Handling Errors
Reports from OCLC with no match records
(validation errors)
Correcting anything in OCLC?
Correcting records in Voyager then resubmitting post-reclamation?
See handouts!
Processing a “Gap” File
Suspended weekly exports to OCLC 9/5/2010
Extracted a version of the bib record between
9/8/10 and 9/10/2010
Identify and extract all changes and new records
from 9/8/10, that have an 079|a and the last
operator in History is not OCLCRECON
Send to OCLC as another one-off project
What Staff Will Do During
Reconciliation
No processing of holdings in OCLC
ILL OK
Will not create IR records so as not to affect
matching
Work in Orbis as normal otherwise
Modifications Needed to Resume
Weekly Exports to OCLC
Two file streams needed-one for archival
materials and one for everything else
PCC records will be split off once at OCLC
YUM records split off once at OCLC
New process/program created
Lessons Learned So Far
Consistent application of standards across
cataloging units (Suppressed, Suppressed!!!, In
Process records, etc.)
What is your database of record?
How much time to spend on fixing records so they
can be sent?
Maintenance of the control numbers long term
Questions?
Thank you!
[email protected]