The Library of Congress eScience Team (eST) Issues & Priorities National Academies Board on Research Data & Information 24 September 2009 ††† Peter R.

Download Report

Transcript The Library of Congress eScience Team (eST) Issues & Priorities National Academies Board on Research Data & Information 24 September 2009 ††† Peter R.

The Library of Congress

eScience Team (eST) Issues & Priorities National Academies Board on Research Data & Information 24 September 2009

††† Peter R. Young Chief, Asian Division The Library of Congress

The Library of Congress

• The Library’s universal collections represent the single greatest repository of recorded knowledge in history • The Library is developing strategies and plans consistent with its global research mission – The Library’s science collections and services are addressing scientific research data issues – Digital research resources are within scope of the Library’s historic mission • New scientific knowledge creation will require linking digital resources to traditional publications in an increasingly competitive world

The Library of Congress

• Mission: – The Library's mission is to make its resources

available

and

useful

to the Congress and the American people, and to

sustain

and

preserve

a universal collection of knowledge and creativity for future generations. • 142 million items in all formats – 34 million books – 13.2 million prints and photos – 5.3 million maps – 61 million manuscripts – 13,500 items received daily • Library Services Unit is responsible for the national library functions of the Library of Congress

The Library of Congress eScience Team (eST) • Deanna Marcum (LC ALLS) established eST in 2009:

• To develop collection strategies for digital science resources and data appropriate for the national library

• eST 2009 activities:

• Explore & analyze digital challenges & requirements • Identify digital research projects • Develop recommendations for the Library’s digital knowledge resources in cooperation with other science research organizations and institutions

The Library of Congress eScience Team (eST)

1. Martha Anderson OSI/NDIIPP 2. Βahadir Akpinar OSI/RDC 3. Ron S. Bluestone LS/S-T-B 4. Leonard Bruno LS/PSCD/MSS 5. Colleen R. Cahill LS/G&M/TSS 6. Babak Hamidzadeh OSI/RDC 7. John R. Hebert LS/CS/G&M 8. William Lefurgy OSI/NDIIPP 9. Debra Ozga LS/POP/FLICC 10. Clay Readding LS/NDMSO 11. Roberta Shaffer LL 12. Peter R. Young LS/CS/AD

The Library of Congress

eST Team Charge - 2009

– Draft strategy documents for digital science resources

– Recommend digital science collection policies

– Create a planning framework for digital science knowledge resources and infrastructure

– Recommend data management policies and digital knowledge resources to support data-driven science

– Meet with other agencies and individuals about digital science data

• • • • • • • • • • • • • •

The Library of Congress

eST Team Activities - 2009

Library of Congress eScience Team Meeting Chronology 13 March 2009 Biodiversity Heritage Library – Tom Garnett (BHL) and Martin Kalfatovic (Smithsonian Institution) 8 April 2009 10 April 2009 24 April 2009 1 May 2009 18 May 2009 29 May 2009 5 June 2009 12 June 2009 22 June 2009 29 June 2009 17 July 2009 24 July 2009 9 October 2009

Research Data in Library Catalogues and the Joint Initiative of European Technical

Libraries for Data Registration - Jan Brase, German National Library of Science and Technology eScience and Data Science: Preparing for the Data Avalanche - Kirk Borne (Department of Computational and Data Sciences, George Mason University), Tim Eastman (Plasmas International – NASA), and Dave Williams (National Space Data Center, NASA) USGS Library’s Digital Initiatives – Richard Huffine, National Library Coordinator, U.S. Geological Survey

Pillbox, eScience, and the Evolution of the Library - Solid Dose Pharmaceutical

Photography Project - David Hale, Division of Specialized Information Services and Terry Yoo, Office of High Performance Computing and Communications National Library of Medicine - National Institute of Health Office of Strategic Initiatives/National Digital Information Infrastructure and Preservation Program – Laura Campbell and OSI/NDIIPP leadership Paul Uhlir (Board on Research Data and Information, National Academies) Board

on Research Data and Information

Babak Hamidezadeh, OSI Project Manager, Digital Initiatives Martha Anderson, Director NDIIPP Program Management Ruth Scovill, Director Technology Policy, Library Services Jane Hunter, Professor of eResearch, School of Information Technology and Electrical Engineering, University of Queensland, Australia G. Sayeed Choudhury, Associate Dean of University Libraries, Johns Hopkins University Chris L. Greer, Director, National Coordination Office for Networking and Information Technology Research and Development, National Science and Technology Council Michael Kurtz, National Archives and Records Administration

• • • •

The Library of Congress – eST 2009

The Library of Congress and Digital Science Data - September 2009 Advances in digital technology have the potential to transform research and to improve scientific discovery. Computationally intensive research is producing new experimental methods that promise increased levels of collaboration, productivity and progress. However, experimental scientific data of increasingly exponential complexity and volume threaten to overwhelm future digital science achievements. To exploit and share this deluge of research data most effectively presents huge challenges that require the development of a digital science data infrastructure to increase knowledge productivity. Current national policies and initiatives to archive, preserve, manage, and make available digital science and research data are uncoordinated and inadequate. Most Federal science agencies support research programs that are discipline-specific. No Federal agency or institution has the capacity or expertise to address the challenges presented by digital science data and e-science [1] which reflect inter- and multi disciplinary collaborations.

Extensive experience with digital information preservation over the last decade positions the Library of Congress to address this challenge and to lead e-science development. With its comprehensive and universal mandate to make knowledge resources available and useful to Congress and the American public, the Library’s mission provides the structure needed to address this e-science data challenge. No other agency is authorized to develop a comprehensive digital science infrastructure plan. No other agency or organization is positioned to lead the development of a national digital science data infrastructure. The Library’s record of leadership in the application of digital technology to information services and knowledge resources provides the basis for leading digital e-science development. By supplying the coordination and leadership for digital science development in specific project activities with partner agencies and institutions, the Library is positioned to advance the e-science vision. To exercise this leadership role in digital science data infrastructure development, the Library plans to create a registry of Federal science data research archives. Experience with an initial “proof-of-concept” project will clarify the Library’s role in developing a Federal science data archive and infrastructure. Such an initiative may involve partnerships focused on specific disciplines. Finally, plans will be developed for the Library to serve as a national digital repository for preserving scientific research data. [1] E-science research makes use of advanced computing tools to share distributed resources via networks. E-science encapsulates the technologies needed to support the collaborative, multidisciplinary research that is emerging in many fields of science that is paving the way for the increasing globalization of research.

• The Library’s record of leadership in the application of digital technology to information services and knowledge resources provides the basis for leading digital e-science development.

• By supplying the coordination and leadership for digital science development in specific project activities with partner agencies and institutions, the Library is positioned to advance the e science vision. To exercise this leadership role in digital science data infrastructure development, the Library plans to create a registry of Federal science data research archives. • Experience with an initial “proof-of-concept” project will clarify the Library’s role in developing a Federal science data archive and infrastructure.

• Such an initiative may involve partnerships focused on specific disciplines. Finally, plans will be developed for the Library to serve as a national digital repository for preserving scientific research data.

The Library of Congress

Digital Initiatives

• American Memory – 9 million digitized American historical collections of U.S. history and culture in 100 thematic collections • National Digital Information Infrastructure & Preservation Program (NDIPP ) – A collaborative network of partnership for collecting, preserving, and making accessible critical digital content • World Digital Library • A collaborative project to digitize and provide access to primary cultural resources from around the world • E-Deposit of Electronic Journals – A collaborative project for ingesting electronic journals through – copyright deposit and to acquire electronic journal content for the Library’s collections • National Digital Newspaper Program Digital preservation of and access to 1 million historic U.S. newspapers in partnership with NEH

The Library of Congress

eST 2010 Plans

• Proof of Concept projects

– Geospatial Data Sets • NDIIPP Partner data • USGS data • LC G&M Division data

• Glossary of eScience Terminology

• eScience Bibliography

• Survey/Registry of Federal Science Data Centers

• Library Services – Office of Strategic Initiatives collaboration to characterize the scope and nature of digital science data requirements

• Create supporting documents for presentation to policy makers

Internet

LoC Repository Development Environment

Firewall / VPN HS1 HS2 OK1 OK2 PS 1 2 3 4 5 6 7 8 9101112 COL ACT STA Storage Servers (Linux/x86_64) CONSOLE Fibre Channel Switches 802.1x/WPA-secured Laptops Application Net Ethernet Switch

Com

Mgmt Net Ethernet Switch Drive Tower(s) Storage Net Ethernet Switch IEEE1394b HS1 HS2 OK1 OK2 PS 1 2 3 4 5 6 7 8 9101112 COL ACT STA CONSOLE Printer(s) ` ` Workstations Linux/x86 + x86_64 Development Servers Team Support Server GigE Storage Network (Copper) F/C Storage Network (Fiber) GigE Application Network (Copper) Management Network (Copper) Storage (SATA, 200TB+)

LoC Repository Service Development

Data Creation

Multi-sector Multi-lingual Cross-domain Interdisciplinary Incentives Preservation Curation Validation Provenance Administration Use agreements

Digital Data Life-cycle Framework

Data Management

Selection/evaluation Scalability Data set integration Interactive services Interoperable architecture Platform independence Repository infrastructure Metadata standards Active data archives Structured and unstructured data Maintenance Security Classification/tagging Workflow Distributed/Federated v. Centralized Common layered infrastructure Semantic web

Data Use

Data-driven solutions Explore/Discover Search/Retrieve Data reuse Data repurpose Interactive Authentication/verification Animation tools Modeling tools Simulation tools Visualization tools Annotation tools Analysis tools Data set/publication links eResearch eScience eKnowledge

Characteristics Multi-disciplinary Inter-disciplinary Trans-disciplinary Info-Centric User-Focused services Workflow Trust Ownership/Open Integrated Cooperative Global eLearning Citizen Science Issues Policy variances Sustainability Economics Technology Resources Data Rights/Ownership Workforce development User-focused Mulit-National/Global Open access – Open source Culture Trust Federal/Academic/Commercial

Digital Data Life-cycle Environment

Value Chain Cooperative Confederation Collaborative Content/Context Changing science workflow Evolutionary Solutions to challenges Information integration