Challenges of Digital Media Preservation Karen Cariani, Director Media Library and Archives Dave MacCarn, Chief Technologist.

Download Report

Transcript Challenges of Digital Media Preservation Karen Cariani, Director Media Library and Archives Dave MacCarn, Chief Technologist.

Challenges of Digital Media
Preservation
Karen Cariani, Director
Media Library and Archives
Dave MacCarn, Chief Technologist
Who we are:
WGBH Media Library and Archives
2
Transition challenges (Analog to Digital)
 Preservation needs are more
complicated
—
—
—
—
—
New and changing content formats
Network connections
Software
Storage media
Hardware
 Access expectations challenging
— Faster access
— Anywhere, anytime
3
Content formats
4
HD Acquisition Codecs
720p
1080p
1920
1920
1440
1280
960
1280
Mbps
1080i
Video Sample
Sample
10 Bit
Audio
#
Bits
√
√
√
4
8
8
8
4
4
8
8
20
16
16
16
16
16
24
24
12
4
24
Container
DCT
[1]
HDCam
DVCProHD@24p
DVCProHD-720p
DVCProHD-1080i
Avid DNxHD
Avid DNxHD
Apple ProRes
Apple ProRes HQ
Wavelet
Red
140
40
100
100
145
220
145
220
√
√
√
√
√
√
25
18-35
50
25
35
50
50
√
√
√
√
√
√
√
100
√
√
√
√
√
MPEG2 I-frame
GFCam
H.264
AVCHD PS
AVCHD
AVCHD@24
Canon 5DMKII
Nikon D800
50
100
RGB or 4:2:2
√
4:2:0
4:2:0
4:2:2
4:2:0
4:2:0
4:2:2
4:2:2
2
4
8
2
2
4
2
MP1
16
24
16
16
16
16
4:2:2
4
16
4:2:2*
4:2:0
4:2:0
4:2:0
4:2:0**
2
2
2
2
2
AC3
AC3
AC3
16
16
√
√
√
√
√
28
24
24
38
24
H.264 I-frame
AVCIntra
AVCIntra
√
√
√
√
√
√
224-336
MPEG2 Long GOP
HDV
XDCamHD
XDCam422
XDCamEX
XDCamEX
GFCam
Canon C300
√
√
√
√
3:1:1
4:2:2
4:2:2
4:2:2
4:2:2
4:2:2
4:2:2
4:2:2
√
√
√
√
√
√
√
√
√
√
440
√
√
880
√
√
√
Tape
DV-AVI, DV-DIF, MXF, QuickTime & Tape
DV-AVI, DV-DIF, MXF, QuickTime & Tape
DV-AVI, DV-DIF, MXF, QuickTime & Tape
MXF & QuickTime
MXF & QuickTime
QuickTime
QuickTime
REDCODE & QuickTime
M2T, MXF, QuickTime & Tape
DV-AVI, MP4, MXF & QuickTime
DV-AVI, MP4, MXF & QuickTime
DV-AVI, MP4, MXF & QuickTime
DV-AVI, MP4, MXF & QuickTime
MXF
MXF
MXF
MTS, MP4
MTS, MP4 & QuickTime
MTS, MP4 & QuickTime
QuickTime
QuickTime
4:2:0
4:2:2
√
√
2
2
16
16
MXF
4:2:2/4:4:4
√
12
24
DPX, Tape
4:2:2/4:4:4
√
12
24
DPX, Tape
MXF
MPEG4 Studio Profile
[2]
HDCamSR
HDCamSR-HQ
[2]
√
*Sony FS100 HDMI output
** 4:2:2 HDMI output
[1] Tape format for comparison
[2] Tape with DPX file out
D. MacCarn, WGBH
Storage and retrieval
How do we:





Capture the audio and video generated by myriad cameras
Store the project information to allow potential re-edit
Store files with rich, meaningful metadata
Store born-digital materials
Display and retrieve born-digital materials
5
Access: Organizational Issues
 Metadata
 Descriptive metadata
— Need description for video to be useful, findable
— How to capture that
— How to make sure it is linked to video files
6
Folder Structure
7

Create folders by card
— Assign unique number
— Continue numbers
— Add description
— Place ENTIRE card
contents into this
folder!!
Original footage
© 2011 WGBH
8
Proposed tapeless workflow
 Create a mapping document between filemaker and
DAM
 Used to generate an xml stylesheet
 Video is ingested simultaneously with the metadata
from filemaker using the xml stylesheet
 Technical metadata is ingested simultaneously with the
video and production data using the xml generated by
the source digital files
9
Challenges - again
 Access issues
— File size
— Formats – to playback
— Useable — Search/findable
 Metadata
 Organize files
 Preservation issues
— Copies
— Formats – for migration
— Being able to play again later
— Speed of access (big file size) – to use/process
— Migration ease
10
Software /Network
 File management
— Where are the files?
 Needed for access to files
— Large preservation files
— Smaller access, proxy files
 Network speed
— Larger files, need faster
network to meet speed
expectations
11
Issues with current file mgmt systems/software
 Preservation not a priority
 Interface issues
— Access vs. Preservation
 IT relationship
— Tech support
— Vendor reliance issues
— Need library based system for Archivist needs rather than traditional
IT company needs
 Expense
— License cost
— Development
— Customizations
12
Access
13




Can find
Can view
Can select
Can get out of
system
 Can reuse in
editing system
Preservation Needs






Multiple Copies
Validity
Bit quality checks
Long lasting storage
Regular migration
Persistence
14
Challenges of preservation and access
 For preservation
— Want to capture as close to original as possible
— Originals may be many different formats
— Will need to make sure you can export and use different formats in
future
— File format issues
— Fixity check big files
 For access
— Want one consistent format for playback/access
— Needs to be easy to migrate, use
15
What makes video different?
 Preservation files are large
— Uncompressed
— Slow to move around
 Need proxy files for viewing
— Smaller size for quick transport
over network
 Complicated formats
— Not just one file type
— Codecs, wrappers, frame speed,
etc
16
Technology Mix:
17
Hydra project




Combine preservation system with access system
Better interface
Flexible design
Easy to evolve
18
Insert graphic




Blacklight Hydra heads
Hydra mgmt layer
Fedora repository
HSM storage system
19
Fundamental Assumption #1
 No single system can provide the full range of
repository-based solutions for a given institution’s needs,
 …yet sustainable solutions require a
common repository infrastructure.
20
Fundamental Assumption #2
 No single institution can resource the
development of a full range of solutions on its
own,
—…yet each needs the flexibility to tailor
solutions to local demands and workflows.
21
Hydra Philosophy -- Community
• An open architecture, with many contributors to a
common core
• Collaboratively built “solution bundles” that can be
adapted and modified to suit local needs
• A community of developers and adopters extending
and enhancing the core
• “If you want to go fast, go alone. If you want to go far,
go together.”
22
CRUD in Repositories
Create/Submit/Edit
(CUD)
Search/View
(R)
Repository/
Persistent Storage
Major Hydra Components
hydra-head
Rails Plugin
(CUD)
Blacklight
(Read
(R)
Only)
Solrizer
Fedora
Solr
Hardware/Storage media: HSM

Access
— Online
 XX bytes Spinning disk
— Offline
— Nearline

Preservation (offline)
— Robotic tape library system
— LT04 data tapes
— 2 copies
— One stored off site

Migration needs 3-5 years
— Both tape migration to newer formats
— Technology migration
New Storage Types and Costs
 Need hierarchical storage (HSM)
— Video files are large
— Spinning disks are expensive
— Tape can help save cost
— Tape copies/migration can be automated
26
New Storage Types and Costs
 But HSM has licensing issues
— Some systems cost by gigabyte managed
— Need Open source alternative
27
Q&A
 Karen: [email protected]
 Dave: [email protected]
28