martin_how_to_face_the_challanges_II_4

Download Report

Transcript martin_how_to_face_the_challanges_II_4

How to Face the Challenges of Web Archiving?

The experiences of a small library on the edge.

Chloe Martin, Internet Memory Catherine Ryan, National Library of Ireland

LIBER 2012 - 1

Context:

National Library of Ireland •

Beginnings

: Established by the Dublin Science and Museum Act, 1877 •

Mission

: “to collect, preserve, promote and make accessible the documentary and intellectual record of the life of Ireland ”.

The Digital Record

: Born Digital Programme established in 2010, covering web archiving.

Web Archive Projects

: 2 pilot projects in 2011 LIBER 2012 - 2

Context:

Internet Memory • •

European Archive / Internet Memory Foundation

• • Established in 2004 in Amsterdam (offices also in Paris)

Mission

: to preserve Web content as a new media for current and future generations

Actions

: Sensibilization, partnerships, R&D

Open Access Collections

: UK National Archives & Parliament, PRONI, CERN and The National Library of Ireland

Internet Memory Research

• • Spin-off of IM established in June 2011 in Paris

Missions

: to operate large scale or selective crawls & develop new technologies (crawl, access, processing and extraction) LIBER 2012 - 3

Web Archiving Project: Project Origins

National Library of Ireland Building a 21 st Century Library: – Born Digital – Digitisation – Single Integrated Catalogue – Digital Repository – OSCAIL, the Digital Library Programme LIBER 2012 - 4

Web Archiving Project: Project Origins

National Library of Ireland Born Digital Materials: • Natural progression for NLI’s strong political, cultural and historical collections • How best to approach this in time of unprecedented financial difficulty?

• Born Digital Programme established to examine requirements and produce a policy document for the next steps LIBER 2012 - 5

Web Archiving Project: Project Origins

National Library of Ireland The Hand of History: – Snap General Election – Five Weeks LIBER 2012 - 6

Web Archiving Project: Project Origins

National Library of Ireland

Just do it

LIBER 2012 - 7

Web Archiving Project: Project Origins

National Library of Ireland

Just do it

How?

LIBER 2012 - 8

Web Archiving Project: Project Origins

National Library of Ireland Collaborative Partnership: Partner that suited our requirements and that had experience with others in the cultural sector Requirements: – Technical skills in the NLI but working on other projects – needed these skills – Leverage NLI’s on strong curatorial experience, esp. in politics – Fast!

LIBER 2012 - 9

Web Archiving Project: Project Origins

National Library of Ireland Project phases: – Project scoping and contract – Site selection – Permissions gathering – QA (look and feel) – Publication and promotion LIBER 2012 - 10

Site Selection and Permissions

National Library of Ireland Selection Criteria: Permissions: – Website presence – Technical reasons – Cut-off date – Women candidates – All sites contacted and provided with a brief – Pressurised but necessary phase LIBER 2012 - 11

Scope of projects

National Library of Ireland General Election: – Crawl: 200 snapshots – Scope: 100 seeds – Frequency: 2 times – Date: Feb. 2011 Presidential Election: – Crawl: 80 snapshots – Scope: 70 seeds – Frequency: 3 times – Date: Oct-Nov. 2011 LIBER 2012 - 12

Crawl

Internet Memory • Seeds Validation: URLs, Duplication, Redirection, External links, Dynamic websites • Scope Parameters: Domain, host and path ; Social Web content ; Frequency ; Robots.txt

files exclusion ; Politeness • Specific incidents  technical changes on the fly Modification of scope ; Pending crawls ; Adaptation of the politeness • Improvement of second crawl LIBER 2012 - 13

Quality Assurance (QA)

National Library of Ireland • Manual QA • Jira software • IM – Technical QA • NLI - ‘Look and Feel’ QA • Multiple browsers • Communication with site owners (building relationships and promotion) LIBER 2012 - 14

Quality Assurance (QA)

Internet Memory • Why?

• How? • Manual and visual method: homepage + 2 • Resolution of issues • Temporal Coherence LIBER 2012 - 15

Access

National Library of Ireland • Available to the public • Full text search • IM website – search by keyword, URL • NLI catalogue – keyword via widget developed by NLI IS team and IM • Future – access through NLI’s own interfaces, issue of integrating results LIBER 2012 - 16

Publication and Promotion

National Library of Ireland • NLI social media initiative (Twitter and blog) • Project participants • Print media (esp. in area of technology) • And IM!

• Usage figures have increased but real value more apparent in 5-10 years LIBER 2012 - 17

1000 900 800 700 600 500 400 300 200 100 0

Usage Statistics of Web Archive

National Library of Ireland Unique visitors per month 21/09/2011: Official launch of NLI Web archives (Tweets) 26/10/2011: Blog post on nli.ie/blog and Paper in thejournal.ie

25/11/2011: Paper on irishtimes.com

20/01/2012: Paper on irishtimes.com

17/03/2012: Post on soundofthearchives.wordpress.com

04/05/2012: Paper on irisheconomy.ie

LIBER 2012 - 18

Advantages of Web Archiving

National Library of Ireland Web archiving: – New opportunities for delivery of materials to users – Work with existing users expectations that content be online – Reach new audiences LIBER 2012 - 19

Advantages of Web Archiving

National Library of Ireland Political web archives;Irish General Election: – Researchers can compare online content pre and post-election – Facilitates research into how ‘online’ this election was – Assess impact of technological developments in campaign communications – Record of campaign information LIBER 2012 - 20

Benefits of Working Together

National Library of Ireland Pilot project for a long-term activity: – Allowed us to enter a new collecting area despite lack of tech expertise – Facilitated collection of important material that one else was collecting – Collect material quickly – Leverage curatorial skills – Gained new technical skills LIBER 2012 - 21

Benefits of Working Together

Internet Memory • To supporte the development of Web archiving initiatives • To operate rapid deployment of Web archives • To address new challenges in this area: • Social media content • QA • Automatization LIBER 2012 - 22

Conclusion

General Election: • 18,495,771 URLs • 1.14 TB • 10,405 ARCs View the

NLI collections

at: http://www.nli.ie/en/udlist/digital collections.aspx

Presidential Election: • 7,333,399 URLs • 278.10 GB • 2,513 ARCs View the

Web archive blog

entry at: http://www.nli.ie/blog/index.php/2011/10 /26/general-election-2011-web archiving/ View

Internet Memory Collections

at: http://collections.europarchive.org/

To be continued…

LIBER 2012 - 23

Questions?

Thanks for your attention!

Catherine Ryan

National Library of Ireland

http://www.

nli

.ie

[email protected]

@NLIreland

Chloe Martin

Internet Memory

http://internetmem ory.org

[email protected]

@InternetMemory

LIBER 2012 - 24