Data and Information Opportunities Board on Research Data and Information Sponsors Meeting September 23rd, 2013 Laura Biven, PhD Senior Science and Technology Advisor Office of.

Download Report

Transcript Data and Information Opportunities Board on Research Data and Information Sponsors Meeting September 23rd, 2013 Laura Biven, PhD Senior Science and Technology Advisor Office of.

Data and Information Opportunities
Board on Research Data and Information Sponsors Meeting
September 23rd, 2013
Laura Biven, PhD
Senior Science and Technology Advisor
Office of the Deputy Director for Science Programs
[email protected]
Priorities Challenges Opportunities
Data
Management
for primary
research
Data
Management
for primary
research
The World
Data
Management
for reuse and
repurposing
Data
Management
for reuse and
repurposing
The World on
Big Data
Not only true for astronomy, high energy physics,… biology, climate, materials science,…
2
Quick-Facts about the DOE Office of Science
Advanced Scientific Computing
Research
Basic Energy Sciences
Biological and Environmental
Research
Fusion Energy Sciences
High Energy Physics
Nuclear Physics
3
The DOE/SC Labs Today – User Facilities
Us
4
ARM
DIII-D
Alcator
NSTX
Users Come from all 50 States and D.C.
SSRL
JGI
ATLAS
HRIBF
ALS
FES
EMSL
Bio & Enviro
Facilities
TJNAF
Nuclear physics
facilities
APS
RHIC
Light Sources
B-Factory
High energy physics
facilities
Tevatron
Computing
Facilities
ALCF
OLCF
Neutron
Sources
Nano
Centers
NERSC
NSLS
LCLS
HFIR
Lujan
SSRL (SLAC)
ALS (LBNL)
APS (ANL)
NSLS (BNL)
LCLS (SLAC)
HFIR (ORNL)
Lujan (LANL)
SNS (ORNL)
CCNM (ANL)
Foundry (LBNL)
CNMS (ORNL)
CINT (SNL/LANL)
CFN (BNL)
NERSC (LBNL)
OLCF (ORNL)
ALCF (ANL)
Tevatron (FNAL)
B-Factory, SLAC
RHIC (BNL)
TJNAF
HRIBF (ORNL)
ATLAS (ANL)
EMSL (PNNL)
JGI (LBNL)
ARM
DIII-D (GA)
Alcator (MIT)
NSTX (PPPL)
SNS
NSRCs
5
Synchrotron Light Sources
SSRL 1974 & 2004
NSLS 1982
NSLS-II 2015
12,000
11,000
10,000
ALS 1993
APS 1996
LCLS 2009
Number of Users
9,000
8,000
7,000
6,000
LCLS
5,000
APS
4,000
ALS
3,000
SSRL
2,000
NSLS
1,000
0
'82 '83 '84 '85 '86 '87 '88 '89 '90 '91 '92 '93 '94 '95 '96 '97 '98 '99 '00 '01 '02 '03 '04 '05 '06 '07 '08 '09 '10 '11
6
Users by Discipline at the Synchrotron Light Sources
100%
10,000
9,500
Number of Users
90%
9,000
Life Sciences
8,500
80%
8,000
Chemical Sciences
7,500
7,000
6,500
60%
6,000
5,500
50%
5,000
4,500
40%
4,000
3,500
30%
Geosciences &
Ecology
Applied
Science/Engineering
Optical/General
Physics
Materials Sciences
3,000
2,500
20%
Other
2,000
1,500
10%
1,000
Total Number of
Users
500
Fiscal Year
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
-
1991
0%
1990
% of Users
70%
7
Advanced Light Source Data Rates
Data and Communication in Basic Energy Sciences:
Creating a Pathway for Scientific Discovery (2012)
8
ASCR and BES, BER, HEP
Data Crosscutting
Requirements
Review
In April 2013, a diverse group of
researchers from the U.S.
Department of Energy (DOE)
scientific community assembled in
Germantown, Maryland to assess
data requirements associated with
DOE-sponsored scientific facilities
and large-scale experiments.
http://science.energy.gov/~/media/ascr/pdf/programdocuments/docs/ASCR_DataCrosscutting2_8_28_1
3.pdf
9
Crosscutting Requirements Report – Findings
•
Many Office of Science experimental facilities anticipate rapid growth
in data volume, velocity, and complexity.
User Facilities need end-to-end systems that provide more automated
workflows and capabilities to ingest, analyze, and manage much larger and
more complex data sets generated at faster rates.
•
There is an urgent need for standards and community APIs for storing,
annotating, and accessing scientific data.
The development of standards and protocols for distributed data and service
interoperability is essential. Furthermore, API standards will enable
collaborations and facilitate extensibility, whereby similar, customized
services can be developed across science domains. Such standardization
will facilitate data reuse and integration from multiple experiments. It also will
be needed as part of any move to provide facility-wide data services.
10
K-Base
http://kbase.us/
11
Administration Directives
Push for consideration of reuse and repurposing is very
timely.
• OSTP Memo: Increasing Access to the Results of
Federally Funded Scientific Data
• Open Data Policy – Managing Data as an Asset
12
DOE/SC Interests
• Incentives for sharing: Data rights, licensing, citation,
privacy, U.S. research competitiveness
• Sustainability of data
• Maintaining good communication with publishing
communities
• Maintaining good communication and coordination with
international partners
13
ASCR and BES
The workshop was organized in the context
of the impending data tsunami that will be
produced by DOE’s BES facilities. Current
facilities, like SLAC National Accelerator
Laboratory’s Linac Coherent Light Source,
can produce up to 18 terabytes (TB) per day,
while upgraded detectors at Lawrence
Berkeley National Laboratory’s Advanced
Light Source will generate ~10TB per hour.
The expectation is that these rates will
increase by over an order of magnitude in the
coming decade. The urgency to develop new
strategies and methods in order to stay
ahead of this deluge and extract the most
science from these facilities was recognized
by all.
http://science.energy.gov/~/media/ascr/pdf/res
earch/scidac/ASCR_BES_Data_Report.pdf
14
Reports
DOE ASCR Advisory
Committee (ASCAC)
Data Subcommittee
Report
This new report discusses the
natural synergies among the
challenges facing data-intensive
science and exascale computing,
including the need for a new
scientific workflow.
http://science.energy.gov/~/media/ascr/ascac/pdf/rep
orts/2013/ASCAC_Data_Intensive_Computing_repor
t_final.pdf
15