Transcript Document

Vocabulary Matching for Book Indexing
Suggestion in Linked Libraries
–
A Prototype
Implementation & Evaluation
Antoine Isaac, Dirk Kramer, Lourens van der Meij,
Shenghui Wang, Stefan Schlobach, Johan Stapel
Problem: subject indexing
• Describing subjects of books
• Using concepts from vocabularies (e.g. thesauri)
Problem: re-indexing
• Describing a book that has already be described
• With a new vocabulary
– Fitting a different context (e.g., different libraries)
Why re-indexing at KB?
• The Dutch National Library (KB) holds many books that
are also in other Dutch public libraries
• KB deposit uses Brinkman thesaurus for indexing
• Public Libraries use Biblion thesaurus
overlap between
book collections
Biblion
Brinkman
Dutch
Public
Libraries
KB
Deposit
Collection
A wider issue
• KB shares books with many other libraries
• All having their own description practices
Doelgroep
-audience
BISAC
subject
codes
other
classifications
NBC
class.
DDC
Dewey
decimal
class.
domain/
discipline
classifications
Brinkman
GTT
LCSH
subject
headings
RAMEAU
subject
headings
SWD
subject
headings
subject
thesauri /
subj. heading
lists
KB
Deposit
Coll.
KB
Scientific
Coll.
LC
(US Nat.
Lib)
BnF
(French
Nat. Lib)
DNB
(German
Nat. Lib)
book
collection
datasets
LC
authority
file
Autorités
BNF
Personen
namen
datei
person/
corporation
data
NUR
UNESCO
class.
Biblion
Dutch
Public
Libraries
Dutch
Booktrade
KB
overlap between book collections
(thickness indicates degree of overlap)
Vertical adjustment between a coll. and KOSs
denotes KOSs' being used to describe that coll.
KB
Corporatie
+ Persoon
Room for improvement?
• Libraries devote large resources to indexing
– 20 people at KB
– About 20,000 books per year
• Leveraging already existing descriptions for reindexing can be beneficial for both sides
Alignment and re-indexing
• STITCH project
– Tackling semantic interoperability in Cultural Heritage
– Using ontology alignment
• Mappings between concepts from different
vocabularies can be used for re-indexing
Basic idea: replace concepts in descriptions
by conceptually equivalent concepts
Goal: a re-indexing prototype
• Past: preliminary experiments with KB data
• Now: building a prototype and
– plugging it onto the KB production system
– having it evaluated by its potential users (indexers)
• Prototype case: Dutch public libraries / KB
Suggesting Brinkman subjects based on Biblion ones
Alignment and re-indexing: requirements
Subjects can be complex
• Mappings between groups of concepts
"Travel guides" + "Spain" → "Spain; travel guides"
Concepts are used in descriptions
• Mappings taking into account extensional semantics
"Building engineering"
→ "Learning material ; building engineering"
Obtaining re-indexing rules
• Lexical alignments are not good enough
• Probabilistic rules are calculated
– Using extension of concepts: existing indexing
– Simple probabilities, with adhoc adjustment
"Travel guides","Spain"→"Spain; travel guides", 0.982
• Not only based on Biblion subjects
– AUT – main authors of books
– KAR – “characteristic”
– DGP – intellectual level/target group
Demo
Doesn't work?
User study
• Quantitative aspect
– How well does the tool compare to human subject
indexing?
• Qualitative aspect
– User satisfaction
– Improvement suggestion
Evaluation setting
•
•
•
•
6 indexers
6 weeks
284 books
Evaluation integrated in daily indexing work
• Pre-evaluation briefing
• Questionnaire during evaluation
• Post-evaluation de-briefing & questionnaire
User study results
Suggestion class # suggestions
precision
recall
blue
308
72.7%
47.9%
purple
1,188
10.7%
27.1%
red
2,525
1.11%
5.98%
non suggested
89
19.0%
• Top ranked mappings are indeed much better
• Individual book satisfaction level > 70%
User study results (1)
• But the general satisfaction is lower
– Only two out of six would use the tool as such
• Quality of suggestions
– Lower-level suggestions are often not meaningful
• Perception of suggestions' quality
– Long lists with wrong suggestions ad the end are bad
– Ranking is appreciated, but it is not enough
User study results (2)
Suggestions were found promising
• Bridging the indexing gap between collections
– Different indexing strategies
"Persian language" (Biblion)
vs. "Iranian language and literature" (Brinkman)
Lots of suggestions for improvement
• More re-indexing!
– Suggesting concepts from other vocabularies
– More context metadata as input
Conclusions
• Shows the potential of re-using data in a library network
• Alignment approach fitting indexing practice
• Concrete demonstration, in KB production environment
• Technology transfer: KB wants to continue efforts
• Flexibility: architecture ready to exploit other vocabularies
– Linked data & SKOS
Prototype components
GGC cataloguing
system
STITCH script
(VisualBasic)
WinIBW
cataloguing interface
suggestion service
(SWI-Prolog)
lexical alignments
Sesame RDF store
Sesame SKOS
RDF store
Indexer
IE
STITCH
stylesheet (XSLT)
vocabulary
service
(Java/Tomcat)
LOD SPARQL
endpoints
Linked libraries?
Doelgroep
-audience
BISAC
subject
codes
other
classifications
NBC
class.
DDC
Dewey
decimal
class.
domain/
discipline
classifications
Brinkman
GTT
LCSH
subject
headings
RAMEAU
subject
headings
SWD
subject
headings
subject
thesauri /
subj. heading
lists
KB
Deposit
Coll.
KB
Scientific
Coll.
LC
(US Nat.
Lib)
BnF
(French
Nat. Lib)
DNB
(German
Nat. Lib)
book
collection
datasets
LC
authority
file
Autorités
BNF
Personen
namen
datei
person/
corporation
data
wikipedia
.de
others
NUR
UNESCO
class.
Biblion
Dutch
Public
Libraries
Dutch
Booktrade
KB
existing KOS alignment
potential KOS alignment of interest
KB
Corporatie
+ Persoon
overlap between book collections
(thickness indicates degree of overlap)
LCSH
currently available entry point to
the LOD cloud
Vertical adjustment between a coll. and KOSs
denotes KOSs' being used to describe that coll.
wikipedia
.nl
Thank you!
• Questions?
Screenshots
WinIBW production tool
STITCH suggestion tool
Original metadata
Concept suggestions
Comparing with human re-indexing
Complement: lexical alignments
Adding subjects using thesaurus access
Concept suggestions
Saving and back to WinIBW
Screenshots
• Back