A new collaborative scientific initiative at Harvard. One-Slide IIC Proposal-driven, from within Harvard “Projects” focus on areas where computers are key to.
Download ReportTranscript A new collaborative scientific initiative at Harvard. One-Slide IIC Proposal-driven, from within Harvard “Projects” focus on areas where computers are key to.
A new collaborative scientific initiative at Harvard. One-Slide IIC Proposal-driven, from within Harvard “Projects” focus on areas where computers are key to new science; widely applicable results Technical focus “Branches” Instrumentation Databases & Provenance Analysis & Simulations Visualization Distributed Computing (e.g. GRID, Semantic Web) Matrix organization: “Projects” by “Branches” Education: Train Future Consumers & Producers of Computational Science Goal: Fill the void in, highly value, and learn from, the emerging field of “computational science.” “Astronomical Medicine” A joint venture of FAS-Astronomy & HMS/BWH-Surgical Planning Lab; Work shown here is from the 2005 Junior Thesis of Michelle Borkin, Harvard College. Filling the “Gap” between Science and Computer Science Scientific disciplines Computer Science departments Increasingly, core problems in science require computational solution Focused on finding elegant solutions to basic computer science challenges Typically hire/“home grow” computationalists, but often lack the expertise or funding to go beyond the immediate pressing need Often see specific, “applied” problems as outside their interests “Workflow” & “Continuum” Workflow Examples Astronomy Public Health “Collect” Telescope Microscope, Stethoscope, Survey COLLECT “National Virtual Observatory”/ COMPLETE CDC Wonder “Analyze” Study the density structure of a starforming glob of gas Find a link between one factory’s chlorine runoff & disease ANALYZE Study the density structure of all starforming gas in… Study the toxic effects of chlorine runoff in the U.S. “Collaborate” Work with your student COLLABORATE Work with 20 people in 5 countries, in real-time “Respond” Write a paper for a Journal. RESPOND Write a paper, the quantitative results of which are shared globally, digitally. IIC branches address shared “workflow” challenges Challenges common to data-intensive science • Data acquisition IIC branches Instrumentation • Data processing, storage, and access Databases/ Provenance • Deriving meaningful insight from large datasets Analysis & Simulations • Maximizing understanding through visual representation • Sharing knowledge and computing resources across geographically dispersed researchers Visualization Distributed Computing Continuum “Computational Science” Missing at Most Universities “Pure” Discipline Science “Pure” Computer Science (e.g. Einstein) (e.g. Turing) IIC Organization: Research and Education Provost Dean, Physical Sciences Assoc Provost IIC Director Dir of Admin & Operations Dir of Research Assoc Dir, Instrumentation Project 1 (Proj Mgr 1) Project 2 (Proj Mgr 2) Project 3 (Proj Mgr 3) Etc. CIO (systems) Knowledge mgmt Assoc Dir, Visualization Assoc Dir, Databases/Data Provenance Assoc Dir, Analysis & Simulation Assoc Dir, Distributed Computing Education & Outreach staff Dir of Education & Outreach COMPLETE/IRAS Ndust QuickTime™ and a TIFF (LZW) decompressor are nee Barnard’s Perseus H 2MASS/NICER Extinction H- emission,WHAM/SHASSA Surveys (see Finkbeiner 2003) IRAS Ndust Numerical Simulation of Star Formation •MHD turbulence gives “t=0” conditions; Jeans mass=1 Msun •50 Msun, 0.38 pc, navg=3 x 105 ptcls/cc •forms ~50 objects •T=10 K •SPH, no B or L, G •movie=1.4 free-fall times Bate, Bonnell & Bromm 2002 (UKAFF) QuickTime™ and a Cinepak decompressor are needed to see this picture. Goal: Statistical Comparison of “Real” and “Synthesized” Star Formation Figure based on work of Padoan, Nordlund, Juvela, et al. Excerpt from realization used in Padoan & Goodman 2002. Measuring Motions: Molecular Line Maps Spectral Line Observations Radio Spectral-line Observations of Interstellar Clouds Radio Spectral-Line Survey Alves, Lada & Lada 1999 Velocity from Spectroscopy Observed Spectrum Telescope Spectrometer 1.5 Intensity 1.0 0.5 0.0 All thanks to Doppler -0.5 100 150 200 250 "Velocity" 300 350 400 QuickTime™ and a TIFF (UncQuickTime™ ompressed) deco andmpre a ssor are needed YUV420 codec decompressor to see this picture. are needed to see this picture. COMPLETE/FCRAO W(13CO) Barnard’s Perseus “Astronomical Medicine” Excerpts from Junior Thesis of Michelle Borkin (Harvard College); IIC Contacts: AG (FAS) & Michael Halle (HMS/BWH/SPL) IC 348 IC 348 “Astronomical Medicine” “Astronomical Medicine” “Astronomical Medicine” Before “Medical Treatment” After “Medical Treatment” 3D Slicer Demo IIC contacts: Michael Halle & Ron Kikinis IIC Research Branches Visualization Physically meaningful combination of diverse data types. Distributed Computing Databases/ Provenance e-Science aspects of large collaborations. Management, and rapid retrieval, of data. Sharing of data and computational resources and tools in real-time. “Research reproducibility” …where did the data come from? How? Analysis & Simulations Development of efficient algorithms. Cross-disciplinary comparative tools (e.g. statistical). Instrumentation Improved data acquisition. Novel hardware approaches (e.g. GPUs, sensors). IIC projects will bring together IIC experts from relevant branches with discipline scientists to address a pressing computing challenge facing the discipline, that has broad application 3D Slicer Distributed Computing & Large Databases: Large Synoptic Survey Telescope Optimized for time domain scan mode deep mode 7 square degree field 6.5m effective aperture 24th mag in 20 sec > 5 Tbyte/night Real-time analysis Simultaneous multiple science goals IIC contact: Christopher Stubbs (FAS) Relative optical survey power 160 Time (x10) Stellar Galactic (x2) Figure of Merit 120 80 40 0 LSST SNAP PanSubaru STARRS CFHT SDSS based on AW = 270 LSST design MMT Astronomy LSST SDSS 2MASS 2011 1998 2001 5000 Peak 500 Avg 8.3 Daily average data rate (TB/day) 20 Annual data store (TB) High Energy Physics DLS BaBar Atlas RHIC 1992 1999 1998 2007 1999 1 1 2.7 60 (zerosuppressd) 6* 540* 120* (’03) 250* (’04) 0.02 0.016 0.008 0.012 0.6 60.0 3 (’03) 10 (’04) 2000 3.6 6 1 0.25 300 7000 200 (’03) 500 (’04) Total data store capacity (TB) 20,000 (10 yrs) 200 24.5 8 2 10,000 100,000 (10 yrs) 10,000 (10 yrs) Peak computational load (GFLOPS) 140,000 100 11 1.00 0.600 2,000 100,000 3,000 Average computational load (GFLOPS) 140,000 10 2 0.700 0.030 2,000 100,000 3,000 Data release delay acceptable 1 day moving 3 months static 2 months 6 months 1 year 6 hrs (trans) 1 yr (static) 1 day (max) <1 hr (typ) Few days 100 days 30 sec none none <1 hour 1 hr none none none TBD 1GHz Xeon 18 450MHz Sparc 28 60-70MHz Sparc 10 500MH z Mixed/ 20GHz/ Pentium/ Pentium 5000 10,000 2500 First year of operation Run-time data rate to storage (MB/sec) Real-time alert of event Type/number of processors MACHO 5 Challenges at the LHC For each experiment (4 total): 10’s of Petabytes/year of data logged 2000 + Collaborators 40 Countries 160 Institutions (Universities, National Laboratories) CPU intensive Global distribution of data Test with « Data Challenges » CPU vs. Collaboration Size CPU v. Collab. CPU v. Collab. CP U 10 0 , 0 0 0 Earth Simulator 10 , 0 0 0 LHC Exp. Grav. Wave 1, 0 0 0 Current accelerator Exp. Nuclear Exp. Astronomy 10 0 Atmospheric Chemistry Group 10 0 500 1000 1500 Co lla bo ra t io n S iz e 2000 2500 event filter (selection & reconstruction) detector Data Handling and Computation for Physics Analysis processed data event summary data raw data event reprocessing batch physics analysis event simulation CERN interactive physics analysis [email protected] analysis objects (extracted by physics topic) Workflow a.k.a. The Scientific Method (in the Age of the Age of High-Speed Networks, Fast Processors, Mass Storage, and Miniature Devices) IIC contact: Matt Welsh, FAS