Library of Congress eScience Team (eST) Update Board on Research Data and Information National Academies 3 June 2010 ††† Peter R.

Download Report

Transcript Library of Congress eScience Team (eST) Update Board on Research Data and Information National Academies 3 June 2010 ††† Peter R.

Library of Congress
eScience Team (eST)
Update
Board on Research Data and Information
National Academies
3 June 2010
†††
Peter R. Young
Library of Congress
1
eScience Team (eST)
Update Outline
• LC Science Information and Data
–
–
–
–
–
1992
2000
2005
2009
2010
LC S&T Initiative
LC 21
Collections Policy Committee Report
Digital archiving
Twitter Archive
• eST 2010
– Charge, Composition, Activities
– Proof-of-Concept Geospatial Data Projects
• OSI/NDIPP Research Data Activities
– Role of Libraries and Archives in Data
Management and Preservation
2
Library of Congress - 1992
• LC’s Science and Technology Initiative 1992
– 1990 review of the Library’s science and
technology information capabilities
– Special Project Team on a National Center for
Science and Technology Information
– “LC will take the lead in the broader STI
community to make it easier for industrial and
educational institutions to obtain usable
technical information to boost innovation and
success in education. “
– “ The Library will continue to serve as ‘America’s
memory’ for scientific publications in electronic,
as well as paper, formats. Libraries have always
played the role of supporting not only today’s
journals, but also yesterday’s.”
3
Library of Congress - 2000
• LC 21: A Digital Strategy for the Library of
Congress, Committee on an Information
Technology Strategy for the Library of
Congress. Computer Sciences and
Telecommunications Board, National
Research Council, 2000:
• “The Library now needs to learn from the
[National Digital Library Program] to broaden
and deepen its strategic awareness of how
that project can help lead to the next
generation of substantially more ambitious
involvement with digital information.”
4
Library of Congress - 2005
– 2005 Report to the Collections Policy Committee
from the Special Committee to Examine the
Potential Role of the Library of Congress in the
Collection, Preservation and Access of Scientific
Databases
• “…the Library may decide that it is not our obligation
to preserve datasets, but to see that they are
preserved.”
• “…the Committee recommends that where the work of
the Congress and those supporting its work require
science datasets, the Library will work to assure access
to these datasets, in consultation with that user
group.”
• “The Committee recommends that LC consider serving
as an archive of last resort – taking responsibility for
collecting/preserving and servicing some smaller
datasets created by individual researchers, which have
been identified by specialists as key research sources
not eligible to be archived elsewhere, and which are in
scope for the Library to collect, regardless of format.” 5
Library of Congress - 2009
• The Library has been collecting materials from the
web since it began harvesting Congressional and
Presidential campaign websites in 2000. Today the
Library holds more than 167 terabytes of webbased information, including legal blogs, websites
of candidates for national office and websites of
Members of Congress.
• In addition, the Library leads the Congressionally
mandated National Digital Information
Infrastructure and Preservation Program
www.digitalpreservation.gov, which is pursuing a
national strategy to collect, preserve and make
available significant digital content, especially
information that is created in digital form only, for
current and future generations.
6
Library of Congress - 2010
• “The Library looks at this (Twitter Archive) as an
opportunity to add new kinds of information
without subtracting from our responsibility to
manage our overall collection. Working with the
Twitter archive will also help the Library to
extend its capability to provide stewardship for
very large sets of born-digital materials.
James H. Billington, Librarian of Congress 15 April 2010
7
Library of Congress Twitter Archive
• Twitter is donating its digital archive of public tweets to the
Library of Congress. Twitter is a leading social networking
service that enables users to send and receive tweets, which
consist of web messages of up to 140 characters.
• Twitter processes more than 50 million tweets per day from
people around the world. The Library will receive all public
tweets-which number in the billions-from the 2006 inception
of the service to the present.
• "The Twitter digital archive has extraordinary potential for
research into our contemporary way of life," said Librarian of
Congress James H. Billington. "This information provides
detailed evidence about how technology based social
networks form and evolve over time. The collection also
documents a remarkable range of social trends. Anyone who
wants to understand how an ever-broadening public is using
social media to engage in an ongoing debate regarding social
and cultural issues will have need of this material."
8
Library of Congress Twitter Archive
“What is pronounced trash to-day
may have an unexpected value
hearafter, and the unconsidered
trifles of the press of the
nineteenth century may prove
highly curious and interesting to
the twentieth, as examples of
what the ancestors of the men of
that day wrote and thought
about.”
Ainsworth R. Spofford, Librarian of
Congress 1864 - 1897
9
The Library of Congress eScience Team (eST)
• Deanna B. Marcum,
Associate Librarian for
Library Services, charged
eST in 2009:
• To develop collection strategies
for digital science resources and
data appropriate for the national
library
• eST 2010 activities:
• Launch cross-unit proof-ofconcept digital data pilot
projects
• Identify opportunities for
initiatives and partnership
involving digital data sets
10
eScience Team (eST)
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
Martha Anderson OSI/NDIIPP
Βahadir Akpinar
OSI/RDC
Rod Atkinson
CRS
Ron S. Bluestone LS/S-T-B
Leonard Bruno
LS/PSCD/MSS
Colleen R. Cahill
LS/G&M/TSS
Dan Cohen
LS & BRDI
Babak Hamidzadeh OSI/RDC
John R. Hebert
LS/CS/G&M
Jan Johansson
CRS
William Lefurgy
OSI/NDIIPP
Debra Ozga
LS/POP/FLICC
Clay Readding
LS/NDMSO
Roberta Shaffer
LL
Peter R. Young
LS/CS/AD
11
eST Team Charge
– Draft Library strategies for digital
science resources
– Recommend digital science collection
policies and workflows
– Create a planning framework for
digital science knowledge resources
and infrastructure
– Recommend data management
policies and digital knowledge
resources to support data-driven
science
– Meet with other agencies and
organizations involved with digital
science data archives and
preservation
12
eST Team Activities – 2009-2010
•
•
•
•
•
•
•
•
•
•
•
13 March 2009 Biodiversity Heritage Library – Tom Garnett (BHL) and Martin
Kalfatovic (Smithsonian Institution)
10 April 2009
eScience and Data Science: Preparing for the Data Avalanche - Kirk
Borne (George Mason University), Tim Eastman (Plasmas International
– NASA), and Dave Williams (National Space Data Center, NASA)
1 May 2009
Pillbox, eScience, and the Evolution of the Library - Solid Dose
Pharmaceutical Photography Project - David Hale, Division of
Specialized Information Services and Terry Yoo, National Library of
Medicine - National Institute of Health
29 May 2009
Paul Uhlir (Board on Research Data and Information, National
Academies) Board on Research Data and Information
17 July 2009
G. Sayeed Choudhury, Associate Dean of University Libraries, Johns
Hopkins University
24 July 2009
Chris L. Greer, Director, National Coordination Office for Networking
and Information Technology Research and Development, National
Science and Technology Council
9 October 2009 National Archives and Records Administration - Michael Kurtz and
Laurence Brewer.
13 October 2009 Pam Bjornson, Director General, Canada Institute for Scientific and
Technical Information
15 January 2010 Corporation for National Research Initiatives (CNRI) Bob
Kahn and Allen Sears
10 March 2010 UCLA Department of Information Studies VideoConference:
“The Role of Libraries and Archives in Data Management”
23 April 2010 Smithsonian Institution – Len Hirsch, Office of the Under
Secretary for Science and James Smith, Senior Research
Analyst
13
eST 2010 Activities
• Proof-of-Concept projects:
– Geospatial Data Sets
• LC Geography & Map Division data
• Congressional Research Service
– Congressional Geospatial Data System
• NDIIPP Partner data
– University of California, Santa Barbara
– North Carolina State Library
• Library Services – Office of Strategic
Initiatives - CRS collaboration to
characterize digital science data workflow
management and archival requirements
14
LC-OSI/NDIIPP & Research Data Activities
• National Geospatial Digital Archives
– Stanford University and UC – Santa Barbara
• Geospatial data and images collection preservation and policy
agreements among partners
• North Carolina Geospatial Data Archiving Project
– North Carolina State University Libraries and North Carolina
Center for Geographic Information and Analysis
• State and county agencies partnership for developing data creation
practices for preservation and access of at-risk data
• Geospatial Multistate Archive and Preservation Project
– North Carolina Center for Geographic Information and Analysis
• Expand state government capability to provide long-term access to
geospatial data and test geographically dispersed content-exchange
• Geospatial Data Preservation Clearinghouse
– Center for International Earth Science Information Network,
Columbia University
• Develop web-based resource of tools, standards, best practices for
geospatial data preservation
• Data-PASS Project
– Inter-University Consortium for Political and Social Research,
Univ of Michigan
• Acquire and preserve at-risk social science data and test distributed
network
15
eST 2010 Geospatial Data Pilot
Projects
–
Transfer of several digital geospatial pilot project
data sets to investigate and analyze these data
sets as possible models for eScience data.
•
–
eST study questions regarding the target digital
geospatial data sets to determine the nature and scope of
the challenges involved with digital content of this
nature.
eST study project initiatives will provide an deeper
understanding about the Library’s role related to
eResearch and eScience.
•
eST study project will clarify workflow, policy, and
technical issues related to transfer, ingest, management,
and access issues related to digital data sets.
16
LoC Repository Service Development
17
The Role of Libraries and Archives in
Data Management & Preservation
•
•
•
•
Sustainability
Scalability
Workflow integration
Lifecycle management
– Preservation
– Conservation
– Curation
•
•
•
•
Costs
Access tools
Use policies
Interdisciplinary skill
requirements
• Links with
communities-of-practice
18
19