Transcript Document

Multimedia and Datasets:
Providing Access to New Forms of
Nuclear Information
BRIAN A. HITSON
UNITED STATES DEPARTMENT OF ENERGY
OFFICE OF SCIENTIFIC AND TECHNICAL
INFORMATION
The “Big Data” Era
A definition: “A collection of
large and complex that it
difficult to process using
database management tools.”
data sets so
becomes
on-hand
(Wikipedia)
How big is “big data”?
22,700,000 hits on Google.
Everybody Is On Board

Policymakers
U.S. “Big Data” Initiative - $200M (March 2012)
 European Commission: “Big Data – The Digital Agenda for Europe
and Challenges for 2012”


Scientists/Authors
The Fourth Paradigm – Data-Intensive Scientific Discovery (2009)
 “Sailing on an Ocean of 0s and 1s,” Science, Vol. 237 (2010)
 “A Deluge of Data Shapes a New Era in Computing,” New York
Times (14 December 2009)


International/National bodies
International Council of Science – ICSU
 World Data System
 CODATA
 U.S. Board on Research Data and Information (BRDI)

Nuclear Data

Nuclear Data*

Types:
 Experimental (e.g., Experimental Nuclear Reaction Data (EXFOR))
 Evaluated (e.g., Evaluated Nuclear Data File (ENDF-6) and Evaluated Nuclear Structure Data
File – ENSDF)
 Reaction: incident neutrons and incident charged particles and photons
 Structure and decay data: half-lives, decay schemes, etc. (Nuclear Data Sheets)

Other data-intensive nuclear fields:
 Nuclear medicine
 Radiation safety
 Waste management and environmental research
 Materials analysis
 Safeguards
 Nuclear astrophysics
* Source: Nuclear Data Section, IAEA, 2000
The Challenges of Numeric Data:
 Data sets are hard to find.
http://nucleardata.nuclear.lu.se/toi/nucSearch.asp
The Challenges of Numeric Data:
 Data sets are hard to navigate.
The Challenges of Numeric Data:
 Data sets are hard to cite.
Why Cite Data?
Data should be cited in just the same way that other sources of information, such
as articles and books, are cited.
Data citation can help by:
 enabling easy reuse and verification of data
 allowing the impact of data to be tracked
 creating a scholarly structure that recognizes and rewards data producers
One Solution: DataCite
What is DataCite?
 A global consortium
composed of local
institutions focused on
improving the scholarly
infrastructure around
datasets and other nontextual information.
 A service for assigning
Digital Object
Identification (DOIs) and
metadata to data sets.
How Data Citation Works
•Originating Research
Organization
•Dataset Type
Data Citation
metadata submitted to
DOE-OSTI
=
•Dataset Title
•Dataset Creator/Author or
Principal Investigator
•Dataset Product Number
•DOE Contract/Award Number
Web
Service
API
241.6
AN
DOI Assigned By
DOE-OSTI
Creator/Author, Primary
Investigator, or
Submitter notified of
Data Citation availability
•Publication/ Issue Date
•Sponsoring Organization
•URL where the Dataset is
posted for access
•Contact information
Data Citation
submitted to
search engines
for indexing
DOE-OSTI submits nightly
feed of new
DOIs to DataCite
DOE-OSTI updates
metadata record with DOI
creating a full
Data Citation
DataCite
Registers DOI
DataCite validates
DOI registration with
DOE-OSTI
Data Citation Demo
PLAY
Multimedia…
…an increasing form of scientific communications
 Videotaped lectures
Multimedia…
…an increasing form of scientific communications
 Visualizations
Multimedia…
…an increasing form of scientific communications
 Experiments/
Simulations
YouTube search on “nuclear” has 3,090,000 results
The Challenges with Multimedia
Science Information
 Lack of written transcripts, i.e. no “full text” to
search
 Metadata, if available, is often minimal
 Scientific, technical, and medical
terminology/vocabulary
 Videos can be long, often up to an hour or more
Access to Multimedia-based
Science & Technology
A Case Study for Enhanced Multimedia
Search & Retrieval
http://www.osti.gov/sciencecinema/
• Partnership between OSTI and Microsoft Research.
• Launched in February 2011; searches ~2,600 multimedia files from
DOE and CERN.
• Utilizes Microsoft Research Audio Video Indexing System (MAVIS).
• Enables searching of digitized spoken content.
• Users can search for precise term within video and be directed to the
exact point in the video where the term was spoken.
Multimedia Search Demo
PLAY
Summary
 Big Data is here.
 Data citation makes data:
 easier to find
 easier to navigate
 Scientific multimedia is here.
 Speech indexing makes multimedia:
 easier to search
 more productive for the scientist and student
Thank You!
Brian A. Hitson
[email protected]
www.osti.gov
865-576-1199