(DOM) Programme - BCS North London Branch

Download Report

Transcript (DOM) Programme - BCS North London Branch

Slide 1

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 2

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 3

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 4

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 5

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 6

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 7

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 8

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 9

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 10

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 11

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 12

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 13

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 14

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 15

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 16

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 17

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 18

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 19

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 20

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 21

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 22

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 23

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 24

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 25

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 26

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 27

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 28

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 29

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 30

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 31

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 32

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 33

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 34

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 35

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 36

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 37

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 38

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 39

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 40

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 41

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 42

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 43

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 44

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 45

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 46

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47


Slide 47

British Computer Society
North London Branch

Major Programmes

Richard Boulderstone
July 27, 2004
1

Agenda











The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions

Magna Carta

www.bl.uk

2

What Is The British Library ?

Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998


www.bl.uk

3

World-Class Research Library
Key Statistics 2002/3













150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html

www.bl.uk

4

‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future

To help people
advance
knowledge to
enrich lives

www.bl.uk

5

High
R+D
Industries

Prof.
Services

Creative
Industries Publishing
Industries

Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre

School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning

SMEs
EDUCATION

Lifelong
Learner

Exhibitions
Events
Tours
Publishing

Visitors
(child + adult)
Lifelong
Learner

PUBLIC

BUSINESS
Lifelong
Learner

Postgraduate/
Undergraduate
RESEARCHER

Librarians
Scholars

Lifelong
Learner

Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools

LIBRARIES

Commercial
Researcher

Public
Libraries

Broadcasting
e.g. BBC

Publishing
e.g. OED

Document Supply
Resource Discovery
Training
Best Practice

H.E.
Libraries

Public

www.bl.uk

6
2

Major Programmes/1
Da Vinci Notebook

Integrated Library System (ILS)
Programme

7

ILS: Development

Data migration
 Due to finish in a few days



16M+ BL records
10M+ records from other sources

Online ILS software
 All online changes made (mainly interfaces) – final tests
 Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
 Most ones done for go live
 Rest in priority order
www.bl.uk

8

ILS: Implementation

Training
 Courses to end-users well underway
 ‘Practice’ system available
 ‘Search only’ training also underway
Testing
 Functional testing (end to end) nearly complete
 Performance poor – OPAC very slow
 Automated stress testing (LoadRunner scripts)
 eIS trying to find area of problem
 Ex Libris experts flying over
 Some security ‘hardening’ needed
www.bl.uk

9

ILS: Cutover from legacy systems

Now:

Temporary Aleph cataloguing

Phase 1 – internal processing
 Staggered take-on of users to ease cutover problems
 Merge ‘temporary’ records

7 June:

Phase 2 – reading rooms
 Reading rooms closed for cutover 26-29 June
 Mainly brand-new PCs etc rather than XP upgrade

30 June:

Phase 3 – remote users
 Could be delayed major problems

30 July:

www.bl.uk

10

www.bl.uk

11

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records
 E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
 E.g. Preservation records
Links to other new BL systems
 E.g. Digital Object Management (images, web pages etc)

New releases of Ex Libris packages

www.bl.uk

12

Major Programmes/2

International Dunhuang Project

Digitisation Programme

13

Background










Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
 External Funding Opportunities
 Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…

www.bl.uk

14

Digitisation Strategy



Digitisation Strategy Project Was Formerly Initiated On February 2, 2004


Key objectives for the project are to define:










Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS

www.bl.uk

15

Project Status Information


Definitive Register of Projects




19 Complete
19 Current
20 Planning



JISC Sound (3,900 Hours)



JISC Newspapers (2M Pages of 750M Pages)



Chopin (Collaborative Project)



Early English Books Online
www.bl.uk

16

Major Programmes/3

Gutenberg Bible

Digital Object Management (DOM)
Programme

17

DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever

Our vision is create a management system for digital objects
that will
 store and preserve any type of digital material in perpetuity
 provide access to this material to users with appropriate permissions
 ensure that the material is easy to find
 ensure that users can view the material with contemporary applications
 ensure that users can, where possible, experience material with the original look-and-feel

www.bl.uk

18

Introduction - history

Digital Library PFI
 Mar 1997 – Dec 1998
Digital Library System
 1999 – early 2002
 Lessons
DOM Report
 Nov 2002
The DOM Programme
 Started September 2003

www.bl.uk

19

Drivers for the BL DOM Programme












Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….

We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives

www.bl.uk

20

DOM – many topics to address
HIGH

ILS
WEB
ARCHIVING

DIGITISATION
PROGRAMME

ESTIMATED SIZE OF COMPONENT

LDEP

WORKFLOW
RESOURCE
DISCOVERY
LDLSE

VDEP

RIGHTS
MANAGEMENT
METADATA
DEFINITION

TECHNICAL
REQUIREMENTS

FILE
FORMAT
REGISTRY

PERSISTENT
IDENTIFIERS

FILE
CONVERSION
UTILITIES

AUTHENTICATION
PROTOTYPES

SDM
RADM

INTERFACES

STRATEGY
DEVELOPMENT

LOW
LOW

COMPONENT AMBIGUITY / COMPLEXITY

Started

Planned

Non-DOM projects

Planned co-operation

HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications

www.bl.uk

21

Scope - life cycle of objects


Collection
 Selection
 Acquisition
 Accession
 Description
 Preservation
 Storage
 Preservation
 Access
 Resource discovery
 Delivery
 Rendering
www.bl.uk

22

Scope – objects and processes


Preservation store
 Preserves the bit stream in perpetuity
 Access store
 Access versions
 Limited formats – in the flavour of the era
 Metadata to support resource discovery
 Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
 Workflow
 Ingest, e.g. Legal Deposit processing

www.bl.uk

23

ACCESS

DOM

Resource Discovery

Delivery

Digital Rights Management

Shared services

DOM
Storage

Signing
Authentication
Metadata
Persistent ID

Ingest

DONATIONS

DOCUMENT SUPPLY
Publishers
Archives
Grey Literature

Non-Serial
Store
Archiving
Operational
Stores

LEGAL DEPOSIT

LDL Secure
Environment

Legal Deposit
Items
Legal Deposit
Processing

WEB
ARCHIVING

DIGITISATION
St Pancras
Studios

Newspapers

NSA
www.bl.uk

24

Timeline

Prototype will provide a
basic preservationquality digital object
storage module

Definition. R0

ET approve
Business Case
& Timeline

•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end

BC
R0
Operat’l Storage
Sub System. R1

•Support ingest for a
major content stream
•Integrate with core
Library systems as
required

R1

1st Content
Stream ingest. R2

R2

LDEP - initial
format. R3 & R4

Provide functionality
for material covered by
LDEP secondary
legislation

R3
R4

Open DOM to new
projects. R5+

2003

2004

2005
www.bl.uk

25

DOM: Project definition - 1
Example issues
Functional Architecture “What”

digital rights, file
formats, etc

Prototyping – assessing market solutions

allow changes to
new suppliers,
relationships to ILS,
other projects etc

Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture

how do we build it
cost-effectively
today, supplier
selection criteria

Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options

• Business case
• Planning – incremental
implementation phases

Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk

26

DOM: Project definition - 2



Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution



A principal goal is to define:
 An overall long term “logical architecture”
 Within which, there will be successive generations of physical architectures



We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement



www.bl.uk

27

DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD

Others

Rights Management
LDEP

DOMID

OBJECT

DOM Storage Service

Compound
objects/relations
Integrity

Authenticity

Local resource locator

Object

Atomic
Objects

Unique persistent
identifier (DOMID)

Doc
supply

DOMID is
mapped to
node/vol/LRL

DOM Physical Storage

www.bl.uk

28

DOM System (release 3)

Aleph

Storage subsystem
Storage
subsystem

Access

Mailroom

Publishers

Shared services

Administration
DOM System
www.bl.uk

29

DOM logical architecture –
integrity and authenticity


Integrity:
 System has capability to continuously monitor the object store to detect
object corruption
 It would then initiate object recovery
 Authenticity:
 A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
 Based on the use of cryptographic signing techniques
 Each object is signed when it is ingested
 The signature is verified when required
 The signing mechanism is “tightly” controlled

www.bl.uk

30

Procuring physical storage in volume









A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors

www.bl.uk

31

Disaster tolerance and the organisation
of storage clusters









One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service

www.bl.uk

32

DOM architecture in the context of the
storage solution market


The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
 However:
 Many of our objects will be rarely accessed
 so we do not want to pay for “maximised” performance we do not need
 We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
 so we do not want to pay for “maximised” resilience we do not need
 We are using these drivers to design a cost-effective large scale resilient
solution

www.bl.uk

33

DOM storage subsystem architecture - overview

DOM
Storage
gateway

DOM
Storage
gateway

DOM
Storage
Service

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Shared
Services
• Unique ID
• Signing
• Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

34

DOM storage subsystem architecture - access

DOM
Storage
gateway

Normal access/delivery
is from local storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

35

DOM storage subsystem architecture - access

DOM
Storage
gateway

When a cluster is off-line
then access/delivery is
from a remote storage
cluster

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

DOM
Storage
gateway

DOM
Storage
Service

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Physical
Storage
Storage cluster

www.bl.uk

36

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised

Synchronise remote store

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

37

DOM storage subsystem architecture - ingest

DOM
Storage
gateway

DOM
Storage
Service

When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later

Synchronise remote store later

Store

DOM
Physical
Storage
Storage cluster

••
••
••

DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging

DOM
Storage
gateway

DOM
Storage
Service

DOM
Physical
Storage
Storage cluster

www.bl.uk

38

In conclusion



We plan for generations of physical storage
 Migration from one generation to the next
 Allow changes of supplier
 Purchase incrementally in modest quantities
 Move quickly when required
 Be cost conscious



We provide assurance that an object is held and re-presented as when it was
ingested



We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

www.bl.uk

39

Major Programmes/4

Web Archiving Programme

40

Structure of Programme

Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
 UK Web Archiving Consortium
 Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
 International Internet Preservation Consortium
 Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation

www.bl.uk

41

UK Web Archiving Consortium
Developing a selective approach to web archiving
 License for PANDAS about to be signed with NLA
 Sub-licenses with consortium partners and contractor to follow
 ITT concluded with Magus Research winning the contract.
 Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
 Provide customisation/development of PANDAS
 Provide help desk and support

www.bl.uk

42

International Internet Preservation
Consortium
Developing advanced web archiving technologies
 Smart Crawler
 Continuous adaptive crawler, adjusting crawl priority on the fly
 Based on IA Heritrix
 Working on requirements now
 Expect to being tender process in June
 Content Management
 Archival formats
 Framework
 Metrics and Test Bed

www.bl.uk

43

External Collaboration

44

Digital Library Collaborations/Partnerships
Current










UK Digital Preservation Collation
 Founder Member
TEL (The European Library Project)
Web Archiving UK
 JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
 BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
 DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
 Union Catalogues (SUNCAT)
Digital Library Federation

www.bl.uk

45

Digital Library Collaborations/Partnerships
Potential









Secure Legal Deposit Network
 6 Legal Deposit Libraries
Global Digital Format Registry
 Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
 KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
 Potential Partners (Publishers, JISC)
Metadata
 Publishers, Others ?
Authentication
 JISC ?
Resource Discovery
 Search Engine Vendors, Researchers
Others ???

www.bl.uk

46

Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success




Can You Work With Us?

www.bl.uk

47