Transcript DFG Viewer

Using Greenstone to create
digital libraries with DFG
standards
Elissa Ernst, Lais Carrasco, Maike Streit
Questions to answer
What kind of technical standards do we need to follow in
creating digitizations from physical objects?
How can we use XML schemes such as METS to organize and
structure a digital collection?
The DFG Viewer is part of DFG collections: what is it, how does
it work?
What are the standards regarding access, what does DFG say
about how the public can access the collection?
How can Greenstone be used to comply with these standards,
and can we make a functioning library that meets all of them?
Selection and standards
"The scientifically motivated digitisation of cultural heritage
materials is considered standard, not a technical novelty. When
it comes to envisioning projects, this means that it continues to
be important to create digitised copies whose quality for
research purposes is beyond reproach, but also that it is crucial
to use effective and cost-conscious methods which can be
applied systematically to large amounts of material."
- larger projects should cooperate with existing libraries and
other institutions, how do they relate to other collections (ie.
Google's Bavarian State Library digitizations)
- look around first to minimize duplicates of already-scanned
material that is already available elsewhere
- selection of works with defined scope
Digitizing Printed Content
Consider the source - using originals vs. microfilm
Consider the method - quantity vs. quality
Process of digitizing documents:
- preparation
- digitization proper
- cataloging / indexing or metadata generation
- long-term safeguarding / digital preservation
Imaging
Color images should definitely be provided, with or without fulltext
- Manuscripts and printed items up to about 1750 should
always be reproduced in color on the basis of the original.
- A high quality digital master in uncompressed
TIFF(or raw) format for archiving, and derivative formats for
distribution such as JPEG and PNG.
- Recommended resolution for most color documents is 300 dpi
relative to the original. Color stored at 24-bit, greyscale at 8-bit.
- Color and size calibrators should also be included to ensure
fidelity.
Tecniques like:
• lighting environment ( to evaluate the results)
• calibration of monitors (to meet coulour issues)
o usage of spectrophotometer that generates a correct
coulour profile to match the original)
• camera choice (to reproduce originals according to size
and create sufficient quality)
o camera matrix should be sufficient to the size of the
object in order to produce high resolution results
should be considered befor digitising material
colour depth should be conidered while digitising material
• bitonal scans: 1 level (1bit per pixel) 1=black, 0=white
• greyscale: 256 levels per pixel (8bit colour depth)
• colour images: 3 channel (RGB) = 3x256 levels -->
256x256x256= 16.7 million colours (24 bit colour depth)
Encoding and full text capability
Full text should be stored in Unicode, and can be generated by
two methods:
- OCR (optical character recognition): done automatically
with software, more effective for certain fonts and uniform
layouts, "dirty" OCR can be used even with errors for simply
returning search results. OCR should always be considered for
machine-press era prints from 1850 onward.
- transcription: double-key, in which the contents are typed
out by hand twice and then compared for errors. Higher
accuracy but more costly and often outsourced.
Metadata standards
DFG requires:
- software independent format
- must be integrated early into the workflow, not left until the
end
- must be integrated with a DFG-funded portal or virtual subject
library
- 4 kinds of metadata:
descriptive (bibliographic),
structural,
technical,
administrative (ie. rights management)
Descriptive and structural requirements
Minimum requirement is descriptive metadata. Cataloging can
also be coordinated with a local library to share resources and
make things easier. Must be coded in such a way that other
portals can find and use the data.
How to structure the document? Follow the digital facsimile
(using TEI), using the original physical page sequence, or
according to the works' text/paragraph structure (using METS)?
"The standards currently recommended for old prints are METS
or TEI. However, the METS-based DFG Viewer should be
supported in all cases." TEI can be converted to be compatible
with the DFG viewer.
The DFG Viewer
The primary purpose of this METS-based interface is to display
images and their metadata in a uniform manner for all
DFG-funded projects. Suitable for browsing, viewing, and
downloading content in various resolutions. Metadata must be
provided in specific formats to be used with the viewer:
METS is the wrapper format containing the resource as well as
most metadata.
MODS is used for displaying bibliographic metadata.
The DFG Viewer
From http://dfg-viewer.de/en/regarding-the-project/ :
"The DFG Viewer is a browser web service for
displaying digital representations from decentralised
library repositories. It has an XML interface for
exchanging meta- and structural data in
the METS/MODS format."
"The DFG Viewer is based on the free CMS TYPO3 and
can be used free of charge by anyone interested. This
can either be done centrally via the web service
operated here or by means of a local implementation."
Releasing a collection into the wild
"The DFG is a cosignatory to the Berlin Declaration on Open
Access. In the spirit of this declaration, the results of DFGfunded digitisation projects should be accessible free of
charge to re-searchers around the world."
"Digitisation projects are expected to present their nature and
scope also on an English-language web page. The fact that the
project is funded by the DFG should be mentioned."
Open Archive Initiative (OAI), a technical exchange protocol,
allows for exchange between different institutions with differing
XML languages - Dublin Core is required as a minimum.
Open access
What are the standards regarding access, what does DFG say about how the
public can access the collection?
The DFG funds the digitisation of scientific materials in order to make
them accessible to researchers in Germany and worldwide.
Therefore all projects should be designed such that their results will
be available to researchers quickly and for the long term.
In virtually all cases, this will entail the provision of digital copies on
the Internet.
It is expected that digital copies will be available online at no cost, in
a quality sufficient for the bulk of typical research purposes.
Basic requirements and architecture
The provisioning system combines digitised image or full-text files into a
document structure to enable users to navigate a document.
Furthermore, it establishes connections between digital documents, or
parts thereof (e.g. chapters, pages), and metadata, to allow users to
access the individual document or certain document parts based on a
metadata search.
Finally, it organises digital documents into digital collections or holdings
according to subject matter or origin, in order let users navigate
documents and collections as they would an open-stack library arranged
by subject.
Basic requirements and architecture
It provides user interfaces for searching, navigating, accessing and
retrieving metadata, documents, collections and holdings, and it supports
largely automated export and import of standards-compliant raw data.
The provisioning systems of the individual libraries and archives should
allow access across institutions, both in navigating digital collections or
holdings and in searching indexes.
In addition, the transparent linkage of provisioning systems with local
catalogue systems and network databases is desirable.
Technical requirements
As far as applicable, servers must be set up to:
•Provide all materials in a quality that allows their convenient use for research
purposes on typical university equipment. This entails, for instance, providing a
type size that is easy to read.
•Provide all materials, conversely, in a quality that allows processing via DSL
without cumbersome delays.
•Enable the free download, for research purposes, of any complete unit as one
single file (e.g. of individual printed works).
•Support all currently popular browsers, to the extent viable.
Accessibility requirements
Collections / holdings may be accessible in a variety of ways:
• via the providing institution’s website;
• via an OAI interface;
• via a locally implemented or externally operated DFG
Viewer;
• via a search inquiry to the local and regional library catalogue
or the local online finding-aids system;
• via the virtual subject libraries’ shared portal or one of the
DFG-funded material-specific portals that enable integrated
access to all digital collections funded under the
DFG programme,
• via Internet search engines.
Navigation requirements
"All materials must be provided in a quality sufficient for
academic purposes and outfitted with intuitive navigation
features to facilitate easy use by the target community and on
typical university equipment. All currently popular browsers
must be supported to the extent that this is objectively viable."
The following navigation functions are considered the basic
standard:
• Go to any desired image
• Home, End, Forward, Back navigation
• Full text search (for books from 1850 onward)
• Metadata info: View current document information
• Help: Help menu should provide detailed descriptions with
examples for navigation and for searching the digital library
• Download, Print as PDF
Is Greenstone capable of making a
collection that meets the DFG criteria?
Greenstone allows for most DFG requirements:
- persistence and linkability of URLs for reliable use as a
research source
- an independent server providing the material and the means
to use it
... but the million dollar question is this: can we integrate it with
the DFG viewer?
Greenstone and the DFG Viewer
Greenstone has its own front end, however implementing DFG
Viewer only requires that you have metadata available in
METS/MODS format available over the web. DFG Viewer can
be used remotely from its own website.
Greenstone provides a selection of metadata sets, but METS
and MODS are not included, so they would have to be
added. You can create a new metadata set with GEMS or
import a defined set from a local file. There is also a plugin
called GreenstoneMETSPlugin, which processes Greenstone
archive documents in METS form during collection building.
Greenstone has a metadata editor where desired schema can
be manually applied - might be a lot of work, but it's possible.
Thanks for your attention!
Hope you got some ideas for your own projects.
Source material for all quotes unless otherwise specified:
http://www.dfg.de/download/pdf/foerderung/programme/lis/praxisregeln_digitalisierung_en.pdf