Cyberinfrastructure Components

Download Report

Transcript Cyberinfrastructure Components

Data Preservation Imperatives: The Role of the
US National Science Foundation
Lucy Nowell, Ph.D.
Office of Cyberinfrastructure
Conference on Permanent Access to the Records
of Science
Brussels, Belgium
15 November 2007
1
Outline
• NSF Office of Cyberinfrastructure
• Motivation for Data Preservation
• Role of Universities and Academic Libraries
• Characteristics of the Digital Age
• NSF OCI Data Strategic Vision and Goals
2
3
NSF Act of 1950
• “To promote the progress of science…”
• Encourage & develop a national policy for
the promotion of basic research and
education in the math, physical, medical,
biological, engineering and other sciences
• Initiate & support basic scientific research in
the sciences
4
U.S. President
Science Advisor
Office of Science and
Technology Policy
Office of
Management
and Budget
Other boards,
councils, etc.
Science Advisor
Major Departments
Agriculture
Health and
Human Services
Interior
Homeland
Security
Defense
Energy
Commerce
Independent Agencies
National
Aeronautic
and Space
Administration
Environmental
Protection
Agency
Smithsonian
Institution
Nuclear
Regulatory
Commission
Other
agencies
5
National Science Foundation
Director
Deputy Director
National
Science
Board
Research Directorates
Offices
• CyberInfrastructure
• Integrative Activities
• Polar Programs
• International Science and
Engineering
• Biological Sciences
• Computer & Info. Science & Eng.
• Education & Human Resources
• Engineering
• Geosciences
• Mathematical & Physical Sciences
• Social, Behaviorial & Econ. Sciences
6
New Modes of Investigation
The conduct of science and
engineering is changing and evolving.
This is due, in large part, to the
expansion of networked
cyberinfrastructure …
NSF Strategic Plan 2006-2011
7
Office of CyberInfrastructure
(OCI)
Judy Hayden
Dan Atkins
Office Director
José Muñoz
Dep. Office Dir.
Lucy
Nowell
Diana
Rhoten
Terry
Langendoen
Data
Learning &
Workforce
Virtual
Organizations
Mary Daley
Irene Lombardo
Deborah White
Kevin
Thompson
Software/
Middleware
Steve Meacham,
Abani Patra
High Performance
Computing
8
Cyberinfrastructure …
… is the organized aggregate of technologies that
enable us to access and integrate today’s
information technology resources—data and
storage, computation, communication,
visualization, networking, scientific instruments,
expertise—to facilitate science and engineering
goals.
- Fran Berman, Director, SDSC
9
CI Vision :
4 Interrelated Perspectives
Collaboratories,
Observatories &
Virtual
Organizations
Data, Data
High Performance
Analysis &
Computing
Visualization
Learning &
Workforce
Development
10
The Fragility of Memory
in a Digital Age
“In 1964, the first electronic mail message
was sent from either MIT, the Carnegie
Institute, or Cambridge University. The
message does not survive, however, and
so there is no documentary record to
determine which group sent the
pathbreaking message.”
Report of the Task Force on Archiving of Digital Information
Commission on Preservation and Access and the Research Libraries Group
11
NASA plans new search for
missing moon tapes
Aug. 15, 2006, 5:13PM
Seth Borenstein, Associated Press
WASHINGTON —NASA said today it was
launching an official search for more than
13,000 original tapes of the historic Apollo
moon missions.
12
Resource type
Resource
half-life
Koehler (1999
and 2002)
Random Web
pages
2.0 years
Nelson and Allen
(2002)
Digital Library
Object
24.5 years
Scholarly Article
Citations
1.5 years
Study
Harter and Kim
(1996)
Rumsey (2002)
Markwell and
Brooks (2002)
Legal Citations
Biological Science
Education
Resources
Computer
Spinellis (2003)
Science Citations
1.4 years
4.6 years
4.0 years
Source: Koehler W. (2004) Information Research, 9 (2), 174
Replication of Results: A
Cornerstone of Science
“…the results of one scientist's experiment
are not considered reliable until another
scientist has replicated them. The
reproducibility of results plays several
different, crucial roles in science…[but] in
many circumstances, considerations of
time and money often make
reproducibility impractical.”
The Key Role of Replication in Science, Nancy S. Hall, The Chronicle of
Higher Education, 10 November 2000
14
Replication of Results
• First and foremost, scientists attempt to reproduce
someone else's experiment if they doubt that the
results are accurate, or if the results contradict a view
that is widely accepted in the field.
• An experiment is so reproducible that replicating it
becomes a test of the student; if the student cannot
replicate the experiment, it is the student who is at
fault.
• As a training exercise, a new person [in a group] might
be asked to repeat experiments that others have
already performed, both to familiarize the newcomer
with the work of the group and to give the older
members a sense of the newcomer's expertise.
The Key Role of Replication in Science, Nancy S. Hall, The Chronicle of Higher
Education, 10 November 2000
15
Replication of Data Collection
Not Always Feasible
• Medical experiments carried out over
years or decades, involving hundreds or
even thousands of human subjects.
• Events that are singular and beyond the
experimenter's control, like comets,
earthquakes, and volcanic eruptions.
The Key Role of Replication in Science, Nancy S. Hall, The
Chronicle of Higher Education, 10 November 2000
16
A Global Response
“Ensuring research data are easily
accessible, so that they can be used as often
and as widely as possible, is a matter of
sound stewardship of public resources.”
Organization for Economic Cooperation and Development (OECD);
“Promoting Access to Public Research Data for Scientific, Economic,
and Social Development”
17
A Challenge for Society
“If we are effectively to preserve for future
generations the …. corpus of information in
digital form that represents our cultural
record, we need … to commit ourselves
technically, legally, economically, and
organizationally to the full dimensions of the
task.”
Report of the Task Force on Archiving of Digital Information, 1996
Commission on Preservation and Access and the Research Libraries Group
18
The Universities
“Ever since their inception, universities have
been occupied with the fundamental
elements of what we now call 'knowledge
management', i.e. the creation, collection,
preservation and dissemination of
knowledge.”
Andre Oesterlinck, Knowledge Management in
Post-Secondary Education: Universities
19
The distinctive mission of the University is to
serve society as a center of higher learning,
providing long-term societal benefits through
transmitting advanced knowledge,
discovering new knowledge, and functioning
as an active working repository of organized
knowledge.
Mission Statement of the University of California
20
The Academic Libraries
“It is to the research library community that
others will look for the preservation of …
digital assets, as they have looked to us in
the past for reliable, long-term access to the
‘traditional’ resources and products of
research and scholarship.”
Association of Research Libraries (ARL)
Strategic Plan 2005-2009
21
Information is the currency of the
digital age and information
integration is the means for
mobilizing that currency for
discovery, innovation, learning, and
progress.
22
23
24
25
26
Before the Digital Age: A World
Constrained to 4 Dimensions
x
z
y
t
x
t
z
x
z
y
y
t
Time
x
x
z
t
y
z
y
27
5th
Dimension
x
z
y
t
t
x
x
z
y
z
Time
y
t
t
x
z
y
x
z
y
28
Opening a 5th dimension
through cyberinfrastructure
is the revolutionary force of
the digital age …
29
Characteristics of a 5D World:
(in priority order)
1. Time and place are no longer barriers to
participation and interaction
2. Access is open to specialists and nonspecialists alike
3. Information is the primary driver for
progress
4. The realm of the possible is expanded
through new capabilities, resources, and
mechanisms
30
Individuals, groups,
organizations, and
nations that don’t
embrace the 5th
dimension will fall
behind in the digital age
31
The World Is Flat
- Thomas Friedman
The flat world is expanding
-Anonymous OCI program director
• More room for innovation
• New spaces for learning and discovery
• Expanded opportunities for collaboration
and interaction
• Greater capabilities for research and
education
32
NSF Draft Strategic Plan
for Data, Data Analysis, and
Visualization
Chapter 3
http://www.nsf.gov/pubs/2007/nsf0728/index.jsp
33
Vision
• “Science and engineering digital data are
routinely deposited in a well-documented
form, are regularly and easily consulted
and analyzed by specialists and nonspecialists alike, are openly accessible
while suitably protected, and are reliably
preserved.”
• NSF Cyberinfrastructure Vision for 21st
Century Discovery, Chapter 3
34
Goals
• To catalyze the development of a system of
science and engineering data collections
that is open, extensible and evolvable.
• To support development of a new
generation of tools and services
facilitating data acquisition, mining,
integration, analysis, and visualization.
35
Principles
• Data generated with NSF funding will be
accessible and reliably preserved
• Research/education opportunities
determine investment priorities
• Broad community engagement is
necessary in reviewing and prioritizing
data activities
36
Principles (cont’d)
• Data is only useful if it can be found,
understood, and analyzed
• Legitimate privacy, confidentiality, and
intellectual property rights must be
protected
• International, interagency, and publicprivate partnerships are essential
37
Digital Data Preservation
and Access Framework
University
State
College

User-centric

Multi-Sector

Sustainable

Reliable

Nimble
USER
Federal
Non-profit
Commercial
Local
International
38
DataNet
• A robust and resilient national and global digital
data framework for preservation and access to
the resources and products of the digital age
• Provide reliable digital preservation, access, integration
and analysis capabilities for science and/or engineering
over a decades-long timeline: sustainability
• Continuously anticipate and adapt to changes in
technologies & user needs and expectations
• Engage at the frontiers of science & engineering
research & education, with research & development to
drive the leading edge forward
• Serve as component elements of an interoperable data
preservation and access network, spanning national and
international boundaries: shared governance and
standards
• Creation of new types of organizations that fully
integrate all of these capabilities
39
DataNet Partners
• Combine expertise in library and archival sciences;
computer, computational and information sciences;
cyberinfrastructure; and domain sciences and
engineering
• Develop models for economic and technological
sustainability over multiple decades
• Engage at the frontiers of science and engineering
research and education
• Work cooperatively and in coordination to to create a
functional data network with revolutionary new
capabilities for information access, use, and
integration without regard to conventional barriers
such as data type and format, discipline or subject
area, and time and place/institution.
40
DataNet Partner
Responsibilities
•
Provide for full data management life cycle
•
•
•
•
•
•
•
•
•
•
Data deposition/acquisition/ingest
Data curation & metadata management
Data protection, including privacy
Data discovery, access, use, & dissemination
Data interoperability, standard, & integration
Data evaluation, analysis, & visualization
Engage in research central to DataNet responsibilities
Education & training
Community & user input assessment
International engagement – collaborate & coordinate
closely with preservation & access organizations to
catalyze formation of a global data network
•
Foreign collaborators are expected to secure support from
their own national sources.
41
Summary Strategic Plan
• Promote a change in culture
• Catalyze development of a national digital
data framework
• Support new generations of tools, services,
and capabilities
42
NSFNet Traffic
September 1991
43
The World Wide DataNet @ T=T0
= Data point-of-presence
44
The World Wide DataNet @ T=TN
45
The Whole Is Greater
Than the Sum of Its Parts
•
•
•
•
•
•
•
Climate Change
Pandemic
Drought and Starvation
Sustainable Energy
Aging Populations
Human Behavior under Stress
Etc.
46
Thank you!
47