BatchLoadPointsCounter -

Download Report

Transcript BatchLoadPointsCounter -

Batch-Load Points Counter
(MARCEdit project)
Amelia C. VanGundy
The University of Virginia’s College at Wise
Virginia SirsiDynix Library Users Group Meeting
Nov. 14, 2012
John Cook Wyllie Library
http://library.uvawise.edu/
• Ebook titles in OPAC
& Ebook packages on web in finding aids
• Rate of e-book acquisition increased
 netLibrary
– 3k titles per year
 EBSCOhost Ebook Academic Collection
– 65k titles initial load
– 5-10k titles additional every quarter
2
Batch Loading Problems
• Existing procedures were difficult to follow
• Procedures were inconsistent
– especially for different vendors
• Didn't take advantage of MARCEdit Tools
• 949 holdings field now includes $a class#
– previously, files loaded with AUTO “call#”
3
Solution? Wish list?
Determine quality of MARC records
– OCLC files vs. other vendor files
Determine editing priorities
– required (001/949), recommended, optional
Learn to construct Regular Expression Strings
– Batch Editing Tools & Find/Replace
• Streamlined format
– needed both an outline & more detailed info
• Make available on-line/web-page
4
MARCEdit proficiency
• Beginner
 Advanced Beginner
– Uses MARCEditor Tools window
(Add/Delete field, Edit Subfield Data, Sort by... )
– Can apply Regular Expression Strings
 Intermediate
– Uses MARC Tools wizard
(Extract Selected Records, MARCSplit, Extract selected records)
– Can construct Regular Expressions
• Expert
5
Batch-Load Points Counter (BLPC)
people.uvawise.edu/acv6d/
6
Batch-Load Points Counter (BLPC)
Webpage & Project link
people.uvawise.edu/acv6d/
1. Introduction
– project concept & desired outcomes
2. Checklist #
– outlines the batch-load procedures & steps
– points counter: “what to do” & “when to stop”
3. Processing Guidelines #
– procedures & how-tos & copy/paste info
4. 949 processing
7
BLPC Introduction & Outcomes
• Validation
– determine integrity of the file
• Processing
– determine quality of the records
• Statistics
– track vendor pkgs, record counts, 001 prefixes
• Points
– max. points = 150 (2.5 hours)
• STOP & contact vendor (request corrected file)
8
BLPC CheckList w/Time estimates
• Step 1 & 2: Preparation & validation
– number of records in file
– integrity of file
– valid URL links
• Step 3-4: Review & processing
– quality of records
– lists all processing/edits possible
• Step 5: 949 holdings
Print on one page (2 p. per sheet / front&back)
9
BLPC Processing Guidelines
(Procedures)
• Gives details for CheckList
– Steps 1-2, Steps 3-4, Step 5
• Gives the regular expression strings (copy/paste)
– Finding/ Replacing/Deleting
– MARCEditor Tools & MARCEdit Tools
• Always use along with Checklist
– includes information to process every field, BUT
– not every field needs processing
Do not print out
10
BLPC Step 1: Preparation & Reports
• MARC Validator
– Identify Invalid Records
– Validate Record (copy/paste into text file)
• Material Type Report
• Field Count
– verify vendor count against MARCEditor count
(LDR/000)
– count early / count often
• Deduplicate (See Addt’l Instruct.)
11
Reports/MARC Validator:
Identify Invalid Records
12
Reports/MARC Validator:
Validate Records
13
Reports/Material Type
14
BLPC Step 2: Verify Field Counts
• Reports/FieldCount for error checking
– first field listed is 000 (corresponds to =LDR)
– last field listed is “numeric”
– 245 count
• Reports/MARCValidator errors
– open text file created in Step 1
– look for specific errors in error file
• Check URL links to make sure they work
15
Reports/Field Count
(vendor count = 8556)
16
Field Count Error & "bad field tag"
(vendor count =694)
17
Reports/Field Count: Detail
(highlight field & right-click)
18
Review Validate Records report
(saved as text file in Step 1.B)
19
BLPC: Review for processing
Checklist Step 3 workflow


Check field counts
Mark-up notes on the Checklist
– Track/count fields that need processing


Track points for fields that need processing
Track points for fields that need manual editing


Each record to fix means extra points
Rule of thumb: for more than 12 manual edits
Treat as separate post-load maintenance project
20
BLPC Checklist Step 3: Review Fields
Examples of required processing

Examine first record & check field count

Title control# – 001 (prefer OCLC#)
If lacking: use info. from 035 or create local 001

Check field counts / subfield counts



Check Validate Record text file for errors


Title/GMD – 245 $h
URL – 856 $3 $y $u
“Invalid field format” / “Subfield cannot repeat”
Check field counts / indicator counts

Subject – 650 Ind2 = 4/7 or 5/6/8
21
BLPC Checklist Step 4: Review fields
Examples of optional processing

Check field count & delete if present


Check field data and delete


029 / 583 / 584 / 938
Other vendor pkg names
(netLibrary/ebrary/myiLibrary/24x7/Ebsco)
Check field data & ignore/defer

300 lacks phrase: (1 electronic resource)
22
BLPC Checklist with mark-ups
23
BLPC Processing workflow
Step 3 - Step 4


Review Field Count
Review Field data
–


Add/Delete/Edit field
Review Field data
–
–


Use Find/Sort window and review first/last field
look at field in first record or Find/Sort window
Mistake? Typo? – use the Edit/SpecialUndo
Review FieldCount
Save edited file / SaveAs new filename
24
MARCEditor / MARCEdit Tools
BLPC Checklist identifies fields to process

MARCEditor Tools window



MARCEditor Edit/Find window



adding/editing/deleting fields
adding/editing deleting subfields
editing/replacing field data
displays sortable list
MARCEdit Tools wizard


for select & extract records
extract tab-delimited records for Excel
25
BLPC Processing: Add std. Phrase
506 => Step 3.S
• Check Field Count for presence of 506
• Delete existing 506 field (if present)
• Consult Step 3.S in BLPC Procedures
– Determine that AddField Tool is needed for processing
– Copy Std.phrase from Step 3.S notes
– Paste into AddField Tool window and submit
• Review 506 data in first record
• Check field count
• Save file
26
MARCEditor Tools: Add std. Phrase
506 => Step 3.S
27
BLPC Processing: Delete specific fields
650 Ind2= 5/6/8 (non-LC) => Step 3.V
• Check Field Count for Presence of 650 Ind2=5/6/8
• Consult Step 3.V in BLPC Procedures
–
–
–
–
Optional Review – FindAll(RegEx) instructions
Determine that Tools/DeleteField tool is needed
Copy RegEx pattern from Step 3.V
Paste into Tools/DeleteField window
– Use Regular Expressions radio button option
– Submit using Delete button
• Check Field Count & Indicator count
• Save file
28
MARCEditor: Delete specific fields
650 Ind2= 5/6/8 (non-LC) => Step 3.V
29
Regular expressions (RegEx)
• Finding/Editing patterns in strings (letters/numbers)
– Like learning another language
• Parentheses are used to group data
– Forces the computer to "store" data in "chunks"
– Data “chunks” are numbered for recall/retrieval/use
– Helps the programmer "read" the pattern
•
Optional functionality, and not necessary
• Some punctuation is "reserved" (has a special meaning)
• BLPC uses consistent format for RegEx patterns
30
Reading RegEx Patterns
650 Ind2= 5/6/8 (non-LC)
Pattern: (=650 )(.[568])(\$a)(.+)
(=650 )
(. [568])
(\$a)
(.+)
look for 650 fields with two blank spaces
look for any Ind1 & listed Ind2 numbers
look for subfield $a (used as "anchor chunk")
any letter/number to the end of the field
Use Edit/FindAll(RegEx) to verify pattern
31
Interpreting RegEx punctuation
Pattern: (=650 )(.[568])(\$a)(.+)
( )
.
[]
\
Parentheses for data “chunks”
Period for any single letter/number
Square brackets for a list using “OR”
Backslash before “reserved” punctuation
esp.:
+
$ \ ( ) [ ]
Plus sign for more of the same
“Chunks” are stored as: $1$2$3$4
32
Creating RegEx patterns
• Start with known pattern:
For non-LC Subjects: (=650 )(.[568])(\$a)(.+)
FindAll(RegEx) for “local” Subjects (Ind2 = 4/7)
(=650 )(.[47])(\$a)(.+)
FindAll(RegEx) for “local” Genres (Ind2 = 4/7)
(=655 )(.[47])(\$a)(.+)
33
Editing with RegEx string pattern
650 BISAC subjects => 690
Start with known pattern: (=650 )(.[568])(\$a)(.+)
• Use Edit/Replace(RegEx): Change 650 to 690
Identify “BISAC” subjects: Ind2=7 & $2 = bisacsh
• Determine which “chunks” change/stay the same
Find(RegEx):
(=650 )(.[7])(\$a)(.+)(\$2bisacsh)
Replace(RegEx): (=690 )$2$3$4$5
34
Reading RegEx Patterns
650 BISAC subjects => 690
Pattern: (=650 )(.[7])(\$a)(.+)(\$2bisacsh)
(=650 ) look for 650 fields with two blank spaces
(.[7])
look for any Ind1 & Ind2 =7
(\$a)
look for subfield $a (optional “anchor” text)
(.+)
any letter/number to the next “chunk”
(\$2bisacsh) look for subfield & data at end of field
Can be shortened (which makes the pattern look complicated):
Find(RegEx):
(=650)(.+\$2bisacsh)
Replace(RegEx): (=690)$2
35
MARCEditor: FindAll(RegEx)
Testing the pattern: 650 BISAC subjects
36
MARCEditor: Replace(RegEx)
650 BISAC subjects => 690
37
BLPC Step 5: 949 processing
Required processing
Policy: Include Class# in Unicorn Item record
949
$a -- Pull the call# from the 050$a
-- Insert the standard phrase: ' INTERNET'
$v -- Pull the 001/OCLC# as a unique no.
$w $h $t $x $z -- Add standard holdings data
• See Addt'l instruct,
38
Batch-loading
• MARCEdit with files no larger than 10k records
– MARCEdit/Tool MARCSplit
• MARCEditor/File: Compile File into MARC
• Unicorn batch load rpt uses 001 match point
– 'o' for OCLC# o & 'g' for local vendor key
• Unicorn batch load rpt settings
– create new bibliographic records only
• Date cataloged -- back dated to prev. month
– prevents interference w/scheduled Authority reports
– max. load two files a day
39
Identifying records for Cleanup
Checklist finds problems to correct post-load
• Item maintenance projects
– 949 lacks call#
• Bibliographic record maintenance projects
– 245 lacks $h (if more than 5-12 records)
– URLs lacking
• Record reload/overlay project
– Record already in OPAC (P-N duplicates)
40
MARCEdit Tools:
Select/Extract selected records
Step 3.F: 245 lacks $h
41
MARCEdit Tools:
Export Tab Delimited records
42
Help!
• MarcEdit Help
http://people.oregonstate.edu/~reeset/marcedit/html/help.html
– Click thru the Contents menu:
Contents / Using MARCEdit / Using the MARCEditor /
Editing Functions / Using Regular Expressions.
• RegularExpressions.info
http://www.regular-expressions.info/
MARCEDIT-L list
http://metis3.gmu.edu/cgi-bin/wa?A0=MARCEDIT-L
BATCH list
http://listserv.vt.edu/cgi-bin/wa?A0=batch
43
Amelia C. VanGundy
The University of Virginia's College at Wise
John Cook Wyllie Library
276-328-0154
[email protected]
http://people.uvawise.edu/acv6d/
Virginia SirsiDynix Library Users Group Meeting
Nov. 14, 2012
44
BLPC Project
Presentation revisions
Originally presented Nov. 14, 2012
• Additional Slides:
– BLCP Project web-page
– MARCEditor: FindAll(RegEx)
– MARCEdit Tools: Export Tab Delimited records
– BLPC Project: Presentation revisions
45