Microsoft PowerPoint - Document Repository

Download Report

Transcript Microsoft PowerPoint - Document Repository

Primo and Omeka : turning local
databases into harvestable
repositories
Alexander J. Jerabek
Bibliothécaire
Technologies de l’information
Service des bibliothèques
[email protected]
2014-05-01
Goal
Make special peripheric collections more
accessible and more visible by integrating them
into Primo
The Pouchet collection
1.
2.
3.
4.
Donation of 36 000 print documents and 20 050
vinyl records to the Music library
Primarily pedagogic or popular documents
Catalogued apart from main catalogue, searchable
in a local database (Access, .asp)
Ongoing work to catalogue all items
The Pouchet collection
The Pouchet collection
The Pouchet collection
The Pouchet collection
Problem
1.
2.
How to get existing records into Primo?
How to get new or modified records into Primo?
Local database ‘Palmaro’
Local database ‘Palmaro’
Omeka
“Omeka is a free, flexible, and open source webpublishing platform for the display of library,
museum, archives, and scholarly collections and
exhibitions. Its “five-minute setup” makes
launching an online exhibition as easy as
launching a blog.”
http://omeka.org/about
Omeka is a project of the Roy Rosenzweig Center for History and New Media,
George Mason University.
Advantages of Omeka
1. Easy set up and maintainance
2. French interface
3. Does exactly what we need : create and update
records and allow harvest via by Primo
4. Useful plugins
5. Create multiple users
6. Long range plans for possible digitization
Disadvantages of Omeka
1.
2.
3.
4.
Not possible to make global changes to records
Dublin Core not always best fit for data
Not always easy to define default values
Not possible to export data
Omeka plugins
1.
2.
3.
4.
5.
6.
CSV Import
OAI-PMH Repository
Simple Vocab
Dublin Core Extended
Hide Elements
Collection Tree
Prepare the staff
1.
2.
3.
4.
Create users
Write up procedures for creating records
Re-iterative process
Test runs in staging to find snags
Omeka admin
Omeka admin
Omeka admin
A few bugs
1. Dropping initial diacritic
2. Cannot search on three letter words
Import data into Excel
1.
Tidy data as much possible
1. Filters in Excel
2. Search and replace in Textpad
3. Corrections using OpenRefine (http://openrefine.org/)
2.
3.
4.
Add columns, constants (e.g. Format)
Crosswalk, column headers to DC elements
Save as csv UTF-8
Excel to CSV
Dataset import into Omeka
Dataset import into Omeka
Omeka CVS import defaults
Choose Column Delimiter is : ;
Choose Tag Delimiter is : |
Choose File Delimiter
:,
Choose Element Delimiter : /
Data set import into Omeka
Data set import into Omeka
Data set import into Omeka
Setting up Primo
1.
2.
3.
4.
5.
6.
Set up a datasource
Set up a scope
Set up a pipe
Create new local fields
Create new set of normalization rules
Tweak Primo interface
1. Set up a data source
2. Set up a scope
3. Set up a pipe
4. Create new local fields
1. lds08 : Parolier (lyricist)
2. lds09 : Compositeur (composer)
3. lds10 : Interprète (performer)
(see notes below for steps)
4. Rules for new local fields
Ex. new field for lyricist based on ‘ (par.) ’
5. Create new normalization rules
Strip out parenthetical notes for display
5. Create new normalization rules
Strip out parenthetical notes for display
5.(record modified in Omeka)
5. Create new normalization rules
Add complementary information
not:
Dublin Core:Publisher
Bibliothèque de Musique
Dublin Core:Description
Disponible au comptoir de prêt
Instead added :
<display>
<ispartof>Musique en feuille no.10599, voir au comptoir de prêt de la Bibliothèque de Musique</ispartof>
5. Create new normalization rules
Added or modified a few elements to conform with our Aleph records
1.
2.
3.
4.
5.
<display/type> = score
<search/general> = Musique en feuille
<search/searchscope> = ubibmusique
<facets/toplevel> = uqam_inst
<facets/library> = M
6. Tweak Primo interface
No use for location/request tab or for more (sfx) tab. Hide them with CSS
using the datasource prefix :
ul.EXLResultTabs li.EXLRequestTab a[href*="BIBMUSIQUE"],
ul.EXLResultTabs li.EXLMoreTab a[href*="BIBMUSIQUE"] {display:none;}
Html:
<ul class="EXLResultTabs…">
<li class="EXLRequestTab…">
<a href="display.do?tabs=requestTab….doc=BIBMUSIQUE10478...">
<a href="display.do?tabs=moreTab...&doc=BIBMUSIQUE10478...">
A few problems, questions remain
Aznavour and Coulonges
The problem of Aznavour as (comp.), (interp.), (par.) – leave in
parenthical elements or remove them
Vs
Aznavour and Coulonges
Aznavour
(include all facets)
Aznavour and Coulonges
Aznavour
(include all facets)
Aznavour and Coulonges
The example of Georges Coulonges as (comp.), (par.) – leave in
parenthical elements or remove them
Vs
A few problems, questions remain
Strip out parenthetical notes for facets and suggested new searches
In addtion to ‘(par.)’ etc. we also have ‘(par. Fr.)’ and others. To get
them all we used:
A few problems, questions remain
Currently no way to limit or prefilter to ‘Musique en feuille’, searchable
elements are incompatible with visible elements
Resource type vs Format
Library vs Collection
Not a visible searchable scope option
Outcomes
1. Collection is available via Primo
2. Records are modified, added, harvested nightly into Primo
3. Circulation stats increase dramatically
Future plans
1. Phase 2 of Pouchet collection, ~10k vinyl recordings
2. Horus : Law library annual reports database, 1500 records
3. Gestio : Management documentation centre collection of
grey literature, technical papers, etc. 6000 records
4. Possibility of adding digital objects if sheet music is
scanned, documents are digitized
Questions?
[email protected]