Data and Information Opportunities Board on Research Data and Information Sponsors Meeting September 23rd, 2013 Laura Biven, PhD Senior Science and Technology Advisor Office of.
Download ReportTranscript Data and Information Opportunities Board on Research Data and Information Sponsors Meeting September 23rd, 2013 Laura Biven, PhD Senior Science and Technology Advisor Office of.
Data and Information Opportunities Board on Research Data and Information Sponsors Meeting September 23rd, 2013 Laura Biven, PhD Senior Science and Technology Advisor Office of the Deputy Director for Science Programs [email protected] Priorities Challenges Opportunities Data Management for primary research Data Management for primary research The World Data Management for reuse and repurposing Data Management for reuse and repurposing The World on Big Data Not only true for astronomy, high energy physics,… biology, climate, materials science,… 2 Quick-Facts about the DOE Office of Science Advanced Scientific Computing Research Basic Energy Sciences Biological and Environmental Research Fusion Energy Sciences High Energy Physics Nuclear Physics 3 The DOE/SC Labs Today – User Facilities Us 4 ARM DIII-D Alcator NSTX Users Come from all 50 States and D.C. SSRL JGI ATLAS HRIBF ALS FES EMSL Bio & Enviro Facilities TJNAF Nuclear physics facilities APS RHIC Light Sources B-Factory High energy physics facilities Tevatron Computing Facilities ALCF OLCF Neutron Sources Nano Centers NERSC NSLS LCLS HFIR Lujan SSRL (SLAC) ALS (LBNL) APS (ANL) NSLS (BNL) LCLS (SLAC) HFIR (ORNL) Lujan (LANL) SNS (ORNL) CCNM (ANL) Foundry (LBNL) CNMS (ORNL) CINT (SNL/LANL) CFN (BNL) NERSC (LBNL) OLCF (ORNL) ALCF (ANL) Tevatron (FNAL) B-Factory, SLAC RHIC (BNL) TJNAF HRIBF (ORNL) ATLAS (ANL) EMSL (PNNL) JGI (LBNL) ARM DIII-D (GA) Alcator (MIT) NSTX (PPPL) SNS NSRCs 5 Synchrotron Light Sources SSRL 1974 & 2004 NSLS 1982 NSLS-II 2015 12,000 11,000 10,000 ALS 1993 APS 1996 LCLS 2009 Number of Users 9,000 8,000 7,000 6,000 LCLS 5,000 APS 4,000 ALS 3,000 SSRL 2,000 NSLS 1,000 0 '82 '83 '84 '85 '86 '87 '88 '89 '90 '91 '92 '93 '94 '95 '96 '97 '98 '99 '00 '01 '02 '03 '04 '05 '06 '07 '08 '09 '10 '11 6 Users by Discipline at the Synchrotron Light Sources 100% 10,000 9,500 Number of Users 90% 9,000 Life Sciences 8,500 80% 8,000 Chemical Sciences 7,500 7,000 6,500 60% 6,000 5,500 50% 5,000 4,500 40% 4,000 3,500 30% Geosciences & Ecology Applied Science/Engineering Optical/General Physics Materials Sciences 3,000 2,500 20% Other 2,000 1,500 10% 1,000 Total Number of Users 500 Fiscal Year 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999 1998 1997 1996 1995 1994 1993 1992 - 1991 0% 1990 % of Users 70% 7 Advanced Light Source Data Rates Data and Communication in Basic Energy Sciences: Creating a Pathway for Scientific Discovery (2012) 8 ASCR and BES, BER, HEP Data Crosscutting Requirements Review In April 2013, a diverse group of researchers from the U.S. Department of Energy (DOE) scientific community assembled in Germantown, Maryland to assess data requirements associated with DOE-sponsored scientific facilities and large-scale experiments. http://science.energy.gov/~/media/ascr/pdf/programdocuments/docs/ASCR_DataCrosscutting2_8_28_1 3.pdf 9 Crosscutting Requirements Report – Findings • Many Office of Science experimental facilities anticipate rapid growth in data volume, velocity, and complexity. User Facilities need end-to-end systems that provide more automated workflows and capabilities to ingest, analyze, and manage much larger and more complex data sets generated at faster rates. • There is an urgent need for standards and community APIs for storing, annotating, and accessing scientific data. The development of standards and protocols for distributed data and service interoperability is essential. Furthermore, API standards will enable collaborations and facilitate extensibility, whereby similar, customized services can be developed across science domains. Such standardization will facilitate data reuse and integration from multiple experiments. It also will be needed as part of any move to provide facility-wide data services. 10 K-Base http://kbase.us/ 11 Administration Directives Push for consideration of reuse and repurposing is very timely. • OSTP Memo: Increasing Access to the Results of Federally Funded Scientific Data • Open Data Policy – Managing Data as an Asset 12 DOE/SC Interests • Incentives for sharing: Data rights, licensing, citation, privacy, U.S. research competitiveness • Sustainability of data • Maintaining good communication with publishing communities • Maintaining good communication and coordination with international partners 13 ASCR and BES The workshop was organized in the context of the impending data tsunami that will be produced by DOE’s BES facilities. Current facilities, like SLAC National Accelerator Laboratory’s Linac Coherent Light Source, can produce up to 18 terabytes (TB) per day, while upgraded detectors at Lawrence Berkeley National Laboratory’s Advanced Light Source will generate ~10TB per hour. The expectation is that these rates will increase by over an order of magnitude in the coming decade. The urgency to develop new strategies and methods in order to stay ahead of this deluge and extract the most science from these facilities was recognized by all. http://science.energy.gov/~/media/ascr/pdf/res earch/scidac/ASCR_BES_Data_Report.pdf 14 Reports DOE ASCR Advisory Committee (ASCAC) Data Subcommittee Report This new report discusses the natural synergies among the challenges facing data-intensive science and exascale computing, including the need for a new scientific workflow. http://science.energy.gov/~/media/ascr/ascac/pdf/rep orts/2013/ASCAC_Data_Intensive_Computing_repor t_final.pdf 15