Deutsche Dissertationen

Download Report

Transcript Deutsche Dissertationen

What libraries can learn from
Google – and what they can do
better
Günter Mühlberger
University Innsbruck Library
Agenda
•
•
•
•
Introduction
A story about digitisation
The continuation of the story
Some conclusions
Introduction
• Department for Digitisation and Digital
Preservation
– Founded in 2002, 14 FTE, R&D and Digitisation
Services
– Since 1998 coordinated several R&D EU projects in
the digital library domain
– Currently involved in several projects, e.g.:
IMPACT (mass-digitisation of textual material,
text recognition and language technologies),
Prestoprime (long term preservation of audiovisual material), both projects will set up a CoC
– Coordinator of the library network eBooks on
Demand (EOD) with 30 member libraries in 13
countries: Digitisation on Demand service
• Several medium and large scale digitisation
projects + respective applications for
searching, browsing, archiving
– Catalogue cards
– Newspapers and newspaper clippings
A short story
• January 2007
– Collection of 30.000 books from a
monastery “Servitenbibliothek” as
present to the library
– No spare shelves at the library for
such a collection since a collection of
German dissertations occupies the best
magazines
– Suggestion to get rid of the
dissertations
– Decision to digitize first and than to
throw them away
• During 2007
– Several experiments with document
Digitisation of dissertations
• 2008 – mid 2010
– Real production process with two parallel
document scanners and up to 70.000 pages per
day, 50.000 pages as average
– Average of 2’ per dissertation (110 pages)
including ALL steps in the workflow
– Convincing scan quality: Tests show that OCR
will be nearly perfect
– All extra pages (supplements, tables, etc.) are
treated extra
– Single cutting of documents too time consuming
– Change of paper quality
• Summer 2010
– We have processed 216.000 dissertations with 24
mill. pages, 1800 shelf meters
– 400 GB image data (TIFF IV bitonal)
– Overall time invested: 8000 hours or 5 person
years
Continuation of the story
• How can we give access to this large
collection?
–
Copyright comes in
• Investigations on Austrian copyright
– We are allowed to scan for preservation
purposes. O.k!
– We are allowed to store for preservation. O.k!
– We are allowed to print out a copy and use it
instead of what we had before we digitised
everything. Hm!
– We are allowed to use this copy for
interlibrary loan – but need to get it back.
Uups!
– We are not allowed to make them available to
the public. O.k!
– We are not allowed to make them available to
our researchers and students at the university.
Uups!
Some more considerations
• “Making available” is a new kind of use
– Copying, distribution, translation, exhibiting,
etc. are traditional use forms and publisher
contracts cover this kind of use
– In 2003 (following the EU Directive on Copyright
from 2001) a new kind of use was introduced:
“making available”
– Since this is a new right “old” contracts
(usually) do not cover this right.
– The author is therefore the right holder, not the
publisher.
– In some countries it is more complicated (e.g.
Germany) but as a rule of thumb most authors in
Europe still have the right to decide by whom,
when and how their digitised work will be made
available to the public
• Dissertations
– Even simpler since no publishers or RROs are
involved
– Dissertations were printed on behalf of the
Our approach to copyright
• Let’s the social Internet work for us
– Dissertations will be made available online,
but only title page, table of contents and
abstract/introduction will be shown to everyone
– Under discussion: Maybe also some more pages
and search snippets
– Readers will get the chance to write a short
“Request”: I would need this book for my
scientific work, etc.
– Readers will be encouraged to contact potential
right holders (“Do the diligent search for us”)
• Registration mechanism
– A big displayer will appear: If you are the
author or if you know the author/right holder –
please help us!
– Authors will need to register (personal
coordinates), set some options and confirm
their statement
Authorisation
• Copyright options
– They may want to make a general statement: Open
Access, Creative Commons, All rights reserved
– A cooperation with authors organisation (RRO)
will make sense
– Or they may want to make a specific statement:
This library is allowed to do that and that.
Than it is a simple bilateral, non-exclusive
contract.
• How to identify the right holder?
– Digital signatures or eCards would make life
much easier.
• Current plan:
– Author provides address.
– He receives a letter with a list of TAN codes
which will be needed for any action within the
system.
– If he chooses to “reserve all rights” the data
Our dream
• We hope
– That it becomes a “self-runner” where those
who need the information will convince
those who have the rights to provide free
access – or at least provide some access
rights for libraries
– That authors will understand why it is so
important that libraries digitise current
material and provide access to everyone
– That users will understand that authors
have rights (copyright and personal rights)
which need to be respected
– That RROs and publishers will understand
that not everyone is interested in “making
money with books written 30 years ago” but
that many are also willing to support the
idea of open access
– That thousands and ten-thousands of authors
What we can learn from Google
• Mission of Google is to organise the information
universe based on technological innovation
– Therefore books are highly important (they contain much
better information than websites)
– Digitisation of books was just one step towards the
overall objective
• If you have a mission, do the first step first and
afterwards sort out the problems
– Organise the cheapest way to scan, build your own
machines, workflow, etc.
– Make a reasonable compromise between quantity and
quality
– Be innovative (take what is here but put it together in
a new way)
• Convert problems into chances
– Rather sure that Google underestimated the impact of
copyright
– Settlement was probably not foreseen from the very
beginning, but now it is a great business opportunity
for them
– If it comes, it will allow them to make a lot of money
What libraries can do better
• Libraries also need to follow their
mission: to preserve the intellectual
heritage of mankind and to provide free
access to everyone
– Google is not a library
– It does many things as if it were a library
(and better), but it never will become a
library
– Preservation comprises analogue AND digital
preservation (go hand in hand)
• to digitise (collect) everything
– Libraries are collection holders, not Google or
anyone else
– Digitisation (and everything what is connected)
has to be part of the daily business and not
only of projects
– Digitisation should be twofold: on-demand AND
via mass digitisation (including cutting of
th
What libraries can do better
• to cooperate among each other (nationally and
internationally)
– Most libraries have the same books, even
duplicates within an institution
– Swedish books in Austria, German books in Sweden,
etc.
– Open access material will no longer belong to one
library, but to everyone!
– Therefore it makes definitely sense to cut one
book and store the pages digitally and analogue
(acid free box)
• to involve readers (and right holders)
– Libraries have a “natural authority” which needs
to be exploited as a market advantage
– Libraries are much nearer to authors and readers
than anyone else, but they need to give them the
chance to express themselves
– They may be slow, old-fashioned and
technologically not on the fore-front but they are
trustful organisations and are able to mobilise
Let’s go to work!