WEB ARCHIVING IN THE BRITISH LIBRARY
Download
Report
Transcript WEB ARCHIVING IN THE BRITISH LIBRARY
WEB ARCHIVING IN THE BRITISH
LIBRARY
John Tuck
Head of British Collections
February 2004
1
BRITISH LIBRARY: CONTEXT
Created by British Library Act 1972.
National Library of the United Kingdom.
Origins from 1753.
One of world’s greatest research libraries.
160 million collection items.
2
BRITISH LIBRARY:
COLLECTION DEVELOPMENT
Building as completely as possible the
UK national published archive - current
and retrospective gap filling; print and
electronic.
Collecting research-level Englishlanguage material published world-wide
in the humanities, social sciences, STM.
Buying foreign-language material
selectively
Material acquired through: legal deposit,
voluntary deposit from publishers, purchase,
donation, exchange.
3
LEGISLATION
Legal Deposit Libraries Act 2003: enabling
legislation.
VDEP: Voluntary Deposit of Electronic
Publications.
4
DOMAIN.UK
Six-month experiment to select and capture
100 UK web-sites, 2001.
audit change, loss, links, etc.
determine next steps.
5
DOMAIN.UK: Why?
Short-lived nature/changing content of many
web-sites.
loss of information.
increasing reference to web-sites in
research/scholarship.
6
DOMAIN.UK:
Voluntary/Rights Cleared Approach
Voluntary.
Requiring explicit agreement of website
publishers to take part in pilot.
No public access.
7
DOMAIN.UK: Selection
Websites of historical or cultural
significance.
Cross-section of Dewey Decimal
Classification.
8
DOMAIN.UK: Process
E-mail selected sites for approval and to
check whether already archived.
Measure sites for links, size, change, etc.
Frequency of visits: every three weeks or
more in some cases.
Supported by those sites approached.
Report recommended scaling up.
9
BRITISH LIBRARY WEB ARCHIVING
PROGRAMME
Building on Domain.uk.
BL to play leading role in collecting UK web
presence in partnership with other
institutions nationally and internationally.
Selective approach.
10
BRITISH LIBRARY WEB ARCHIVING
PROGRAMME contd.
Co-ordinate a snapshot of entire UK web
presence at occasional intervals.
Achieve more regular capture of limited and
well-defined range of sites.
Sites judged to be research-level, whether in
terms of stated intentions of sites
themselves or of potential to be primary
resources for research.
11
WEB ARCHIVING PROGRAMME
Comprises a series of complementary
projects and activities.
Based entirely on voluntary, rights-cleared
basis pending secondary legal deposit
legislation.
Aims to embed web archiving within the
BL's overall collection development policy.
Aims to provide the infrastructure to collect,
preserve and make accessible web-site
material alongside material in other formats.
12
WEB ARCHIVING PROGRAMME
STRANDS
Four main strands:
Definition of collection development policy.
UK Web Archiving Consortium.
International Internet Preservation
Consortium.
Internet Archive: incunabula of the internet.
13
COLLECTION DEVELOPMENT
Appointment of Curator, Web Archiving.
Extension of policy defined for Domain.uk.
Sites of national, historical and cultural
significance.
Research level now/in the future.
14
UK WEB ARCHIVING CONSORTIUM
Two-year project.
Six partners: BL (lead); National Library of
Scotland, National Library of Wales,
National Archives, Joint Information
Systems Committee, Wellcome Library.
Plan to use PANDAS software developed by
National Library of Australia.
Rights to use individual sites to be cleared
with rights-holders.
15
UK WEB ARCHIVING CONSORTIUM
contd.
Procurement exercise in process to recruit
supplier to host service.
Intention to let contract in April 2004 and to
be operational in summer 2004.
Sites to be made accessible to users.
Each partner to collect up to 500 sites per
year, i.e. 6,000 during project.
16
INTERNATIONAL INTERNET
PRESERVATION CONSORTIUM
Project involving national libraries.
Led by Bibliotheque Nationale de France.
Also includes BL, Library of Congress,
Library and Archives of Canada, Nordic
countries, Italy, Australia, Internet Archive.
17
INTERNATIONAL INTERNET
PRESERVATION CONSORTIUM
contd.
Aims to develop automated web-crawler
mechanism.
Open-source tools to search web at regular
intervals matching agreed collection
development policies.
Working groups in: access tools; content
management, deep web, framework, metrics
and test-beds, researcher requirements.
Developmental at this stage.
18
INTERNET ARCHIVE
Collecting and saving sites since 1997.
Wayback machine.
Legal, technical and procurement issues.
19
SOME CHALLENGES
Defining UK.
Rapid technology change.
Third party rights (not always subject to UK
law).
Libel/defamation issues.
Software issues / which platform?
Validity of a snapshot.
20
SOME CHALLENGES
contd.
Formats for archiving.
Metadata standards.
Archiving ‘look and feel’.
Authenticity.
21