Transcript Document
Multimedia and Datasets: Providing Access to New Forms of Nuclear Information BRIAN A. HITSON UNITED STATES DEPARTMENT OF ENERGY OFFICE OF SCIENTIFIC AND TECHNICAL INFORMATION The “Big Data” Era A definition: “A collection of large and complex that it difficult to process using database management tools.” data sets so becomes on-hand (Wikipedia) How big is “big data”? 22,700,000 hits on Google. Everybody Is On Board Policymakers U.S. “Big Data” Initiative - $200M (March 2012) European Commission: “Big Data – The Digital Agenda for Europe and Challenges for 2012” Scientists/Authors The Fourth Paradigm – Data-Intensive Scientific Discovery (2009) “Sailing on an Ocean of 0s and 1s,” Science, Vol. 237 (2010) “A Deluge of Data Shapes a New Era in Computing,” New York Times (14 December 2009) International/National bodies International Council of Science – ICSU World Data System CODATA U.S. Board on Research Data and Information (BRDI) Nuclear Data Nuclear Data* Types: Experimental (e.g., Experimental Nuclear Reaction Data (EXFOR)) Evaluated (e.g., Evaluated Nuclear Data File (ENDF-6) and Evaluated Nuclear Structure Data File – ENSDF) Reaction: incident neutrons and incident charged particles and photons Structure and decay data: half-lives, decay schemes, etc. (Nuclear Data Sheets) Other data-intensive nuclear fields: Nuclear medicine Radiation safety Waste management and environmental research Materials analysis Safeguards Nuclear astrophysics * Source: Nuclear Data Section, IAEA, 2000 The Challenges of Numeric Data: Data sets are hard to find. http://nucleardata.nuclear.lu.se/toi/nucSearch.asp The Challenges of Numeric Data: Data sets are hard to navigate. The Challenges of Numeric Data: Data sets are hard to cite. Why Cite Data? Data should be cited in just the same way that other sources of information, such as articles and books, are cited. Data citation can help by: enabling easy reuse and verification of data allowing the impact of data to be tracked creating a scholarly structure that recognizes and rewards data producers One Solution: DataCite What is DataCite? A global consortium composed of local institutions focused on improving the scholarly infrastructure around datasets and other nontextual information. A service for assigning Digital Object Identification (DOIs) and metadata to data sets. How Data Citation Works •Originating Research Organization •Dataset Type Data Citation metadata submitted to DOE-OSTI = •Dataset Title •Dataset Creator/Author or Principal Investigator •Dataset Product Number •DOE Contract/Award Number Web Service API 241.6 AN DOI Assigned By DOE-OSTI Creator/Author, Primary Investigator, or Submitter notified of Data Citation availability •Publication/ Issue Date •Sponsoring Organization •URL where the Dataset is posted for access •Contact information Data Citation submitted to search engines for indexing DOE-OSTI submits nightly feed of new DOIs to DataCite DOE-OSTI updates metadata record with DOI creating a full Data Citation DataCite Registers DOI DataCite validates DOI registration with DOE-OSTI Data Citation Demo PLAY Multimedia… …an increasing form of scientific communications Videotaped lectures Multimedia… …an increasing form of scientific communications Visualizations Multimedia… …an increasing form of scientific communications Experiments/ Simulations YouTube search on “nuclear” has 3,090,000 results The Challenges with Multimedia Science Information Lack of written transcripts, i.e. no “full text” to search Metadata, if available, is often minimal Scientific, technical, and medical terminology/vocabulary Videos can be long, often up to an hour or more Access to Multimedia-based Science & Technology A Case Study for Enhanced Multimedia Search & Retrieval http://www.osti.gov/sciencecinema/ • Partnership between OSTI and Microsoft Research. • Launched in February 2011; searches ~2,600 multimedia files from DOE and CERN. • Utilizes Microsoft Research Audio Video Indexing System (MAVIS). • Enables searching of digitized spoken content. • Users can search for precise term within video and be directed to the exact point in the video where the term was spoken. Multimedia Search Demo PLAY Summary Big Data is here. Data citation makes data: easier to find easier to navigate Scientific multimedia is here. Speech indexing makes multimedia: easier to search more productive for the scientist and student Thank You! Brian A. Hitson [email protected] www.osti.gov 865-576-1199