Transcript (DOM) Programme - BCS North London Branch
Slide 1
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 2
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 3
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 4
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 5
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 6
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 7
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 8
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 9
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 10
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 11
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 12
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 13
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 14
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 15
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 16
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 17
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 18
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 19
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 20
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 21
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 22
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 23
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 24
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 25
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 26
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 27
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 28
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 29
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 30
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 31
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 32
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 33
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 34
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 35
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 36
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 37
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 38
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 39
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 40
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 41
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 42
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 43
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 44
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 45
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 46
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 47
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 2
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 3
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 4
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 5
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 6
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 7
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 8
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 9
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 10
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 11
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 12
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 13
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 14
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 15
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 16
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 17
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 18
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 19
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 20
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 21
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 22
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 23
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 24
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 25
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 26
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 27
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 28
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 29
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 30
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 31
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 32
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 33
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 34
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 35
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 36
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 37
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 38
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 39
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 40
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 41
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 42
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 43
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 44
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 45
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 46
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47
Slide 47
British Computer Society
North London Branch
Major Programmes
Richard Boulderstone
July 27, 2004
1
Agenda
•
•
•
•
•
•
•
•
•
The British Library
Vision
Our Audiences/Customers
ILS
Digitisation
Digital Object Management
Web Archiving
Collaboration
Conclusions
Magna Carta
www.bl.uk
2
What Is The British Library ?
Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of
Science and Invention (1855), National Central Library (1916), and National
Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India
Office Library and Records in 1982, and British Institute of Recorded Sound in
1983
• Flagship building at St Pancras - largest public building project in Great
Britain in 20th century - opened in 1998
•
www.bl.uk
3
World-Class Research Library
Key Statistics 2002/3
•
•
•
•
•
•
•
•
•
•
150 million items
8.2 million items consulted or supplied
408,000 reading room visits
618,000 catalogue records created
554,000 items received on legal deposit
651 km shelf capacity 92% full add 12 km each year
18.5M Web Site Hits (www.bl.uk)
2,400 staff
£85.2 million Grant in Aid and £27.0 million trading income in 2001/2
Annual report - http://www.bl.uk/about/annual/latest.html
www.bl.uk
4
‘The World’s Knowledge’
Outcome Based Vision
… by aiding scientific advances
… by adding commercial value for
businesses
… by contributing to UK “knowledge
economy”
… through the pursuit of academic
excellence
… through the stimulation of ideas
… by adding to personal and family history
… through increasing the nation’s cultural
wellbeing
… by giving information relevant to their
interests
… by helping to find the next medical
breakthrough
… by creating a link between the past,
present and future
To help people
advance
knowledge to
enrich lives
www.bl.uk
5
High
R+D
Industries
Prof.
Services
Creative
Industries Publishing
Industries
Resource Discovery
Bespoke Services
Research Services
Document Supply
Reprographics
Innovation Centre
School
Students
Libraries Teachers 11>18
On-site Visits
School Tours
Web Learning
SMEs
EDUCATION
Lifelong
Learner
Exhibitions
Events
Tours
Publishing
Visitors
(child + adult)
Lifelong
Learner
PUBLIC
BUSINESS
Lifelong
Learner
Postgraduate/
Undergraduate
RESEARCHER
Librarians
Scholars
Lifelong
Learner
Reading Rooms
Bespoke Services
Reprographics
Publishing
Document Supply
Searching Tools
LIBRARIES
Commercial
Researcher
Public
Libraries
Broadcasting
e.g. BBC
Publishing
e.g. OED
Document Supply
Resource Discovery
Training
Best Practice
H.E.
Libraries
Public
www.bl.uk
6
2
Major Programmes/1
Da Vinci Notebook
Integrated Library System (ILS)
Programme
7
ILS: Development
Data migration
Due to finish in a few days
16M+ BL records
10M+ records from other sources
Online ILS software
All online changes made (mainly interfaces) – final tests
Web OPAC configuration – tested by staff, HE, expert
Batch imports / exports
Most ones done for go live
Rest in priority order
www.bl.uk
8
ILS: Implementation
Training
Courses to end-users well underway
‘Practice’ system available
‘Search only’ training also underway
Testing
Functional testing (end to end) nearly complete
Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem
Ex Libris experts flying over
Some security ‘hardening’ needed
www.bl.uk
9
ILS: Cutover from legacy systems
Now:
Temporary Aleph cataloguing
Phase 1 – internal processing
Staggered take-on of users to ease cutover problems
Merge ‘temporary’ records
7 June:
Phase 2 – reading rooms
Reading rooms closed for cutover 26-29 June
Mainly brand-new PCs etc rather than XP upgrade
30 June:
Phase 3 – remote users
Could be delayed major problems
30 July:
www.bl.uk
10
www.bl.uk
11
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records
E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions
E.g. Preservation records
Links to other new BL systems
E.g. Digital Object Management (images, web pages etc)
New releases of Ex Libris packages
www.bl.uk
12
Major Programmes/2
International Dunhuang Project
Digitisation Programme
13
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital
Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character
Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By
External Funding Opportunities
Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project
Management
BL Has Created About 1.5M Digital Images So Far…
www.bl.uk
14
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria
Uniform Approach
Communications Plan
Sustainability
Intellectual Property Rights
External Relationship Management
Funding
Integration with DOMS
www.bl.uk
15
Project Status Information
Definitive Register of Projects
19 Complete
19 Current
20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
www.bl.uk
16
Major Programmes/3
Gutenberg Bible
Digital Object Management (DOM)
Programme
17
DOM Programme vision
Our mission is to enable the United Kingdom to preserve
and use its digital intellectual property forever
Our vision is create a management system for digital objects
that will
store and preserve any type of digital material in perpetuity
provide access to this material to users with appropriate permissions
ensure that the material is easy to find
ensure that users can view the material with contemporary applications
ensure that users can, where possible, experience material with the original look-and-feel
www.bl.uk
18
Introduction - history
Digital Library PFI
Mar 1997 – Dec 1998
Digital Library System
1999 – early 2002
Lessons
DOM Report
Nov 2002
The DOM Programme
Started September 2003
www.bl.uk
19
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000
Storage of digitised masters from early ’90s onwards
New digitisation initiatives: newspapers, sound, etc
Sound archive receives 12T of material per year (with 50 year collection)
Web archiving
Cartography and datasets
Electronic journals, picture library
… and ….
…. and ….
We need a generic and cost-effective approach for the secure long term
storage of digital material that is produced by numerous initiatives
www.bl.uk
20
DOM – many topics to address
HIGH
ILS
WEB
ARCHIVING
DIGITISATION
PROGRAMME
ESTIMATED SIZE OF COMPONENT
LDEP
WORKFLOW
RESOURCE
DISCOVERY
LDLSE
VDEP
RIGHTS
MANAGEMENT
METADATA
DEFINITION
TECHNICAL
REQUIREMENTS
FILE
FORMAT
REGISTRY
PERSISTENT
IDENTIFIERS
FILE
CONVERSION
UTILITIES
AUTHENTICATION
PROTOTYPES
SDM
RADM
INTERFACES
STRATEGY
DEVELOPMENT
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
Started
Planned
Non-DOM projects
Planned co-operation
HIGH
LDEP: Legal Deposit of Electronic Publications
LDLSE: Legal Deposit Libraries Secure Environment
RADM: Risk Analysis of Digital Materials
SDM: Storage of Digitised Masters
VDEP: Voluntary Deposit of Electronic Publications
www.bl.uk
21
Scope - life cycle of objects
Collection
Selection
Acquisition
Accession
Description
Preservation
Storage
Preservation
Access
Resource discovery
Delivery
Rendering
www.bl.uk
22
Scope – objects and processes
Preservation store
Preserves the bit stream in perpetuity
Access store
Access versions
Limited formats – in the flavour of the era
Metadata to support resource discovery
Descriptive, Administrative, Links with existing tools e.g. Integrated
Library System (ILS)
Workflow
Ingest, e.g. Legal Deposit processing
www.bl.uk
23
ACCESS
DOM
Resource Discovery
Delivery
Digital Rights Management
Shared services
DOM
Storage
Signing
Authentication
Metadata
Persistent ID
Ingest
DONATIONS
DOCUMENT SUPPLY
Publishers
Archives
Grey Literature
Non-Serial
Store
Archiving
Operational
Stores
LEGAL DEPOSIT
LDL Secure
Environment
Legal Deposit
Items
Legal Deposit
Processing
WEB
ARCHIVING
DIGITISATION
St Pancras
Studios
Newspapers
NSA
www.bl.uk
24
Timeline
Prototype will provide a
basic preservationquality digital object
storage module
Definition. R0
ET approve
Business Case
& Timeline
•Consolidate R0 into operational
system
•Provide preservation-quality digital
store for materials received under
Voluntary Deposit of Electronic
Publications (VDEP)
•Integrate it with the existing VDEP
front-end
BC
R0
Operat’l Storage
Sub System. R1
•Support ingest for a
major content stream
•Integrate with core
Library systems as
required
R1
1st Content
Stream ingest. R2
R2
LDEP - initial
format. R3 & R4
Provide functionality
for material covered by
LDEP secondary
legislation
R3
R4
Open DOM to new
projects. R5+
2003
2004
2005
www.bl.uk
25
DOM: Project definition - 1
Example issues
Functional Architecture “What”
digital rights, file
formats, etc
Prototyping – assessing market solutions
allow changes to
new suppliers,
relationships to ILS,
other projects etc
Logical architecture “how – overall architecture”
Prototyping - basic functioning architecture
how do we build it
cost-effectively
today, supplier
selection criteria
Physical architecture “how – storage & specifics”
Prototyping - principal solutions and options
• Business case
• Planning – incremental
implementation phases
Cross team workshops – reviewing progress, debating detailed technical
issues, planning immediate priorities, risk management & way forward
www.bl.uk
26
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’
We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a
good solution
A principal goal is to define:
An overall long term “logical architecture”
Within which, there will be successive generations of physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage
procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need
flexible scalable procurement
www.bl.uk
27
DOM architecture - overview
Resource Discovery
Non-cat
ILS
based RD
Others
Rights Management
LDEP
DOMID
OBJECT
DOM Storage Service
Compound
objects/relations
Integrity
Authenticity
Local resource locator
Object
Atomic
Objects
Unique persistent
identifier (DOMID)
Doc
supply
DOMID is
mapped to
node/vol/LRL
DOM Physical Storage
www.bl.uk
28
DOM System (release 3)
Aleph
Storage subsystem
Storage
subsystem
Access
Mailroom
Publishers
Shared services
Administration
DOM System
www.bl.uk
29
DOM logical architecture –
integrity and authenticity
Integrity:
System has capability to continuously monitor the object store to detect
object corruption
It would then initiate object recovery
Authenticity:
A process is defined to provide long-term assurance that an object that
is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques
Each object is signed when it is ingested
The signature is verified when required
The signing mechanism is “tightly” controlled
www.bl.uk
30
Procuring physical storage in volume
A major cost is in physical storage
The market for storage systems is changing rapidly, and this implies that “lock-in” is
not sensible
We thus need flexibility to change supplier over time
Cost of storage is reducing by 30-40% per year
Hence procure on rolling basis just ahead of demand
Replace storage on a rolling basis on expiry of warranty
The rolling programmes imply the need to be able to support a heterogeneous
product solution
The design of the logical architecture thus supports storage sourced from multiple
storage vendors
www.bl.uk
31
Disaster tolerance and the organisation
of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment
configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss
of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution
Conventionally these are based on a master-standby where only 50% of kit is delivering
normal service
Our design is based on the use of multiple autonomous independent peer clusters that
cross-synchronise
so 100% of the kit delivers normal service
www.bl.uk
32
DOM architecture in the context of the
storage solution market
The dominant segment of the market focuses on delivering performance within
a highly resilient single cluster
However:
Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced
need for resilience within a cluster
so we do not want to pay for “maximised” resilience we do not need
We are using these drivers to design a cost-effective large scale resilient
solution
www.bl.uk
33
DOM storage subsystem architecture - overview
DOM
Storage
gateway
DOM
Storage
gateway
DOM
Storage
Service
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Shared
Services
• Unique ID
• Signing
• Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
34
DOM storage subsystem architecture - access
DOM
Storage
gateway
Normal access/delivery
is from local storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
35
DOM storage subsystem architecture - access
DOM
Storage
gateway
When a cluster is off-line
then access/delivery is
from a remote storage
cluster
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
DOM
Storage
gateway
DOM
Storage
Service
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Physical
Storage
Storage cluster
www.bl.uk
36
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
Normal ingest is to the
local storage cluster and
then the remote cluster
is synchronised
Synchronise remote store
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
37
DOM storage subsystem architecture - ingest
DOM
Storage
gateway
DOM
Storage
Service
When a cluster is off-line
then ingest is managed by
the remote storage cluster
and the local cluster is
synchronised later
Synchronise remote store later
Store
DOM
Physical
Storage
Storage cluster
••
••
••
DOM
DOM
Shared
central
Services
Unique
Unique ID
ID
Signing
Signing
Logging
Logging
DOM
Storage
gateway
DOM
Storage
Service
DOM
Physical
Storage
Storage cluster
www.bl.uk
38
In conclusion
We plan for generations of physical storage
Migration from one generation to the next
Allow changes of supplier
Purchase incrementally in modest quantities
Move quickly when required
Be cost conscious
We provide assurance that an object is held and re-presented as when it was
ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
www.bl.uk
39
Major Programmes/4
Web Archiving Programme
40
Structure of Programme
Web Archiving Programme is a collaborative
initiative, roughly implemented across two
consortiums
UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common
web archiving infrastructure and software to begin archiving activities at
the earliest
International Internet Preservation Consortium
Developing advanced web archiving technologies for the long terms,
large scale, continuous crawling requirements enabled through
legislation
www.bl.uk
41
UK Web Archiving Consortium
Developing a selective approach to web archiving
License for PANDAS about to be signed with NLA
Sub-licenses with consortium partners and contractor to follow
ITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux
machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
www.bl.uk
42
International Internet Preservation
Consortium
Developing advanced web archiving technologies
Smart Crawler
Continuous adaptive crawler, adjusting crawl priority on the fly
Based on IA Heritrix
Working on requirements now
Expect to being tender process in June
Content Management
Archival formats
Framework
Metrics and Test Bed
www.bl.uk
43
External Collaboration
44
Digital Library Collaborations/Partnerships
Current
UK Digital Preservation Collation
Founder Member
TEL (The European Library Project)
Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium
BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of
Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre
Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB)
Resource Discovery
Union Catalogues (SUNCAT)
Digital Library Federation
www.bl.uk
45
Digital Library Collaborations/Partnerships
Potential
Secure Legal Deposit Network
6 Legal Deposit Libraries
Global Digital Format Registry
Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration)
KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management
Potential Partners (Publishers, JISC)
Metadata
Publishers, Others ?
Authentication
JISC ?
Resource Discovery
Search Engine Vendors, Researchers
Others ???
www.bl.uk
46
Conclusions
Beautiful Building!
• Market & Outcome Focus
• Huge IT Agenda
• Collaboration Is Critical To Our Success
•
•
Can You Work With Us?
www.bl.uk
47