Starter show and example presentation slides using OCLC

Download Report

Transcript Starter show and example presentation slides using OCLC

October 2014
OCLC
ISNI Annual General Assembly, Frankfurt 2014
ISNI Assignment
Janifer Gatenby
EMEA Program Manager Metadata
OCLC
The world’s libraries. Connected.
Assigned ISNIs October 2014
2 + independent sources
3,956,454
Assigned
8 million
Provisional: Possible
701,157
3+ VIAF sources
494,002
Unique name
Provisional: Unassigned
9,953,505
3,233,924
Single source (JISC names,
BOEK, Ringgold)
342, 234
Total
8,026,614
The world’s libraries. Connected.
ISNI Assignment: Batch loading
Independent matching
sources
3 VIAF sources
The world’s libraries. Connected.
ISNI Matching
Name
Title
Partial title
Rare title word
Date
Scores are collected from each
judge (ice skating style)
Publisher
Lowered for common surnames and
common titles
Personal affiliation
Score > .85 = match
Organisation affiliation
ISBN, ISWC, ISAN, DOI +
Other name identifier e.g. IPI, VIAF, IPD
Instrument
Linked entities
Dewey classification
The world’s libraries. Connected.
Score >.6 but <.85 = possible match
ISNI Assignment: Batch loading
Unique name
Single source
The world’s libraries. Connected.
Central database - Trust
+ % confidence
Publicly accessible www.isni.org
Assigned
≈ 8 million
Provisional: Possible
≈638,000
Provisional:
Unassigned
9+ Million
- % confidence
Assignment only if confident
Assignment is curated
Authoritative
Unique
Trustful
Persistent
Matching algorithms
Data sampling
Anomaly checks
Quality assurance processes
End User input notes
Confidence
The two main problems for maintaining persistence are
• duplicates needing to be merged
• undifferentiated identities needing to be split
ISNI errs on the side of making duplicates rather than mixed identities
Thus the batch load process (usually) makes a provisional record
• where there is no match (for fear of making a duplicate assignment)
• where there is a low confidence match (for fear of making a mixed
identity or a duplicate assignment)
• where a matching record already has another local ID for the same
source, regardless of the strength of the match (for fear of making a
mixed identity)
The world’s libraries. Connected.
Procedures for maximizing assignment
• Refinement of matching algorithms
• E.g. introduced rare title word;
• Now ignoring date of birth 1900
• Re-import program
• Rematch with new rules
• Rematch after new data added
• ISNI Quality Team: Data sampling
• assessing impact of single source
• Recommendations for program changes
• New criteria
• Assessing uncommon surname assignment
• Rules for online rich assignment
The world’s libraries. Connected.
Online: Guarantee assignment – Personal Name
ISNIs will be automatically assigned where there are no possible
matches in these cases:

There are matches with a database record with a different source

A personal name is unique and includes a surname and
forename
The request includes an “isNot” statement
The metadata supplied is considered rich as per these cases:
•
Full date of birth and death supplied
•
Year of birth + 1 title or instrument+ 1 related name (coauthor or affiliated institution)
•
1 title or instrument + 1 external URL link of type
encyclopaedia, home page (not social network page) + 1
related name (co-author or affiliated institution)
The request is resolving a possible match by including a PPN



The world’s libraries. Connected.
Online: Guarantee assignment – Organisation Name
ISNIs will be automatically assigned where there are no possible
matches in these cases:

There are matches with a database record with a different source

An organisation name is unique and does not consist only of
abbreviations
The metadata supplied is considered rich as per these cases:
•
Includes LOCODE &
•
Organisation type &
•
Organisation URL
The request is resolving a possible match by including a PPN


The world’s libraries. Connected.
Maximizing assignment
 Enter a request record online (Web page or via API)
 Batch loaded records – passive method
May
2012 fixes
% assigned
• Quality Team
manual
Oct 2014
% assigned
ALCS
41,523
63.86%
49,157
76.66%
PROL
2,205
35.24%
4,143
66.18%
PROQ
65,122
12.89%
243,481
48.19%
• OCLC periodic re-match runs
• Matches from later batch loading & online activity
 Batch loaded records – active method
• Resolve possible matches found by the system
May 2012 for %
assignedrecords
Oct
% assigned
• Search the database
candidate
for2014
merging
AUVLU
0% sources
1,716
48.28%
• Enrich a record with0URLs to external
such as author’s
ICLA web pages, Wikipedia,
0 IMDB, MusicBrainz,
0%
2,208
Discogs,
etc. 97.61%
The world’s libraries. Connected.
Finding possible matches
Command
What it finds
Cn: proq & bs: [01]*
All your records with a possible match
Cn: proq & bs: 1*
Exact duplicates
Cn: proq & bs: 09*
Probably your duplicates
Cn: proq & bs: 08*
Most likely are matches
Cn: proq & bs: 07*
Possible matches
Cn: proq & bs: 06*
Possible matches, lower match confidence
DECISIONS
Records should merge
One of the records should split (note to QT)
Different identities
The world’s libraries. Connected.
Resolving Possible Matches
The world’s libraries. Connected.
Compare Screen
The world’s libraries. Connected.
Adding a new record – Michel Calame
The world’s libraries. Connected.
Adding a new record
The world’s libraries. Connected.
Adding a new record
The world’s libraries. Connected.
Adding a new record for an Organisation
The world’s libraries. Connected.
New Organisation form
The world’s libraries. Connected.
Adding your source to an existing record
The world’s libraries. Connected.
Adding your source to an existing record
The world’s libraries. Connected.
Correcting and enriching
These are all the same person. The second has an incorrect DOB = 1900
The world’s libraries. Connected.
Enriching
You can add a source note or general note to any database record, your
code does not need to be present
The world’s libraries. Connected.
Reporting errors
The general note will trigger an email to the ISNI Quality Team for attention
The world’s libraries. Connected.
Atom Pub API (Machine to machine)
• Requests and replacements (you can replace your existing data citing local identifier)
• Request
• Atom Pub Header
• Content = Request in the ISNI XML Request schema
• Documentation
• ISNI Atom Pub API guidlines.doc
• ISNI request.xsd (XML schema)
• ISNI request schema.doc (describes the schema)
• ISNI response.xsd (XML schema)
• ISNI response schema.doc (describes the schema)
The world’s libraries. Connected.
Documentation: Data Submission
ISNI tab delimited format
ISNI tab delimited format organisations
ISNI data element values
ISNI XML request schema
Documents relating to data
ISNI XML request schema document
submission
ISNI Atom Pub interactive request requirements
ISNI Data contributors usage guidelines
ISNI database source profiles RAG information
ISNI bulk load submission
ISNI XML response schema
ISNI XML response schema document
ISNI XML notification schema
Documents relating to data
bulk load assigned ISNIs.xsd
submission output
bulk load ISNI not assigned.xsd
bulk load too many matches.xsd
ISNI Data contributors reports and notifications guidelines
The world’s libraries. Connected.
ISNI Charges
Enquiry
no charge
Resolving possible match;
no charge
Resolving non match
no charge
Correcting information or adding information to no charge
an existing record
Adding a source to a record (status is
assigned, provisional or suspect) or
Adding a new record
The world’s libraries. Connected.
100 p.a. free
ISNI request
rate*
What is requested from ISNI Data Contributors?
Ingest ISNIs
Act on notifications
(new assignments, changed assignments, errors and queries)
Assist in reviewing possible matches
(Exact matches then possible matches)
Add a note to any record found with an error
Supply URI
Keep data up to date
(become a RAG or use the services of an existing one)
The world’s libraries. Connected.