A Vision for a New Era in Computational Science

Download Report

Transcript A Vision for a New Era in Computational Science

Cyberinfrastructure and California

Dr. Francine Berman

Director, San Diego Supercomputer Center Professor and High Performance Computing Endowed Chair, UC San Diego

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD UNIVERSITY OF CALIFORNIA

The Digital World

Science Commerce Information

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD UNIVERSITY OF CALIFORNIA

Entertainment

Today’s Technology is a Team Sport

Today’s “computer” is a coordinated set of hardware, software, data, and services providing an “end-to end” resource.

Cyberinfrastructure

captures the integrated character of today’s IT environment

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman wireless DATA Field instrument computer computer network computer DATA storage sensors network DATA viz computer network field instrument The “computer” as an integrated set of resources

UCSD UNIVERSITY OF CALIFORNIA

Cyberinfrastructure -- An Integrating Concept

Cyberinfrastructure = Resources

(computers, data storage, networks, scientific instruments, experts, etc.)

+ “Glue”

(integrating software, systems, and organizations)

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD UNIVERSITY OF CALIFORNIA

How does Cyberinfrastructure Work?

Cyberinfrastructure-enabled Neurosurgery

Radiologists and neurosurgeons at Brigham and Women’s Hospital, Harvard Medical School exploring

transmission of 30/40 MB brain images

(generated during surgery) to SDSC for analysis and alignment

• • •

PROBLEM:

Neuro-surgeons seek to remove as much tumor tissue as possible while minimizing removal of healthy brain tissue

Brain deforms during surgery

Surgeons must align preoperative brain image with intra-operative images to provide surgeons the best opportunity for intra-surgical navigation

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

Transmission repeated

every hour

during 6-8 hour surgery.

Transmission and output must take on

the order of

minutes

Finite element simulation

biomechanical model for volumetric deformation surgeons on performed at SDSC; output results are sent to BWH where updated images are shown to

UCSD UNIVERSITY OF CALIFORNIA

SDSC is a National Cyberinfrastructure Center

• • • •

SDSC

National facility funded by NSF, NIH, DOE, Library of Congress, NARA, etc.

Employs nearly 400 researchers, staff and students National Facility and UCSD Organized Research Unit Home to many associated activities including

• • Protein Data Bank Biomedical Informatics Research Network (BIRN) Coordinating Center • • Geosciences Network (GEON) NEES IT Center, etc.

Data and Knowledge Systems Grid and Cluster Computing SW tools, workbenches, toolkits Community Databases and Data Collections

SAN DIEGO SUPERCOMPUTER CENTER

High Performance computing

UCSD

Data oriented Science and Engineering Networking Computational Science and Engineering

UNIVERSITY OF CALIFORNIA

Fran Berman

SDSC Resources Are Available to the Community

• • •

COMPUTE SYSTEMS DataStar

• • • • 2,528 Power4+ processors IBM p655 8-way and p690 32-way nodes 7 TB total memory Up to 3 GBps I/O to disk

TeraGrid Cluster

• 512 Itanium2 IA-64 processors • • 1 TB total memory Also 128 2-way data nodes

Blue Gene Data

• • • First academic IBM Blue Gene system 2,048 PowerPC processors 128 I/O nodes

http://www.sdsc.edu/ user_services/

• • • • • • • •

DATA ENVIRONMENT

1.4 PB Storage-area Network (SAN) 6 PB StorageTek tape library HPSS and SAM-QFS archival systems DB2, Oracle, MySQL Storage Resource Broker 72-CPU Sun Fire 15K IBM p690s – HPSS, DB2, etc

http://datacentral.sdsc.edu/ SAN DIEGO SUPERCOMPUTER CENTER UCSD

Support for community data collections and databases Data management, mining, analysis, and preservation

• • • • • •

SCIENCE and TECHNOLOGY STAFF, SOFTWARE, SERVICES

User Services Application/Community Collaborations Education and Training SDSC Synthesis Center Community SW, toolkits, portals, codes

http://www.sdsc.edu/

Fran Berman

UNIVERSITY OF CALIFORNIA

• • •

Cyberinfrastructure Can Help Harness Today’s Deluge of Data

Over the next decade, data will

• • • •

come from everywhere Scientific instruments Experiments Sensors and sensornets New devices (personal digital devices, computer enabled clothing, cars, …)

• • • •

And be used by everyone Scientists Consumers Educators General public Data from simulations Cyberinfrastructure must support unprecedented diversity, globalization, integration, scale, and use Volunteer Data Data from sensors Data from instruments Data from analysis SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD UNIVERSITY OF CALIFORNIA

How much Data is there?*

iPod Shuffle (up to 120 songs) = 512 MegaBytes Printed materials in the Library of Congress = 10 TeraBytes 1 human brain at the micron level = 1 PetaByte

Kilo

10 3

Mega

10 6

Giga

10 9 1 novel = 1 MegaByte

Tera

10 12 1 Low Resolution Photo = 100 KiloBytes

Peta Exa

10 10 15 18

* Rough/average estimates

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

SDSC HPSS tape archive = 6 PetaBytes

UCSD UNIVERSITY OF CALIFORNIA

All worldwide information in one year = 2 ExaBytes

Cybeirnfrastructure and Data: Using Data for Analysis and Simulation

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD UNIVERSITY OF CALIFORNIA

Cyberinfrastructure – enabled Disaster Preparedness

• • •

The SCEC TeraShake simulation is a result of immense effort from the Geoscience community for over 10 years

Focus is on understanding big earthquakes and how they will impact sediment-filled basins.

Simulation combines massive amounts of data, high-resolution models, large-scale supercomputer runs

1906 M 7.8

Major Earthquakes on the San Andreas Fault, 1680-present

1857 M 7.8

How dangerous is the southern San Andreas Fault?

1680 M 7.7

?

• •

TeraShake results provide new information enabling better

• Estimation of seismic risk • Emergency preparation, response and planning • Design of next generation of earthquake-resistant structures

Such simulations provide potentially immense benefits in saving both many lives and billions in economic losses SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD UNIVERSITY OF CALIFORNIA

Domain: 600Km x 300km x 80km Mesh Dimension: 3000x1500x400 Spatial resolution = 200m Simulated time = 200s Number of time steps = 20,000

What you’re looking at:

L.A. experiences strong ground motion from the S->N scenario

The N->S rupture generates strong reverberations in the Imperial Valley, ultimately hitting Mexicalli and other northern Mexico cities.

Large local peaks in ground motion near Palm Springs, resulting in immense damage.

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD UNIVERSITY OF CALIFORNIA

Making Terashake Resources Work --

• •

Computers and Systems

• 80,000 hours on 240 processors of DataStar • 256 GB memory p690 used for testing, p655s used for production run, TG used for porting • • 30 TB Global Parallel file GPFS Run-time 100 MB/s data transfer from GPFS to SAM-QFS • 27,000 hours post-processing for high resolution rendering

People

• 20+ people involved in information technology support • 20+ people involved in geoscience modeling and simulation

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

• •

Data Storage

• 47 TB archival tape storage on Sun StorEdge SAM-QFS • 47 TB backup on High Performance Storage system HPSS • SRB Collection with 1,000,000 files

Funding

• SDSC Cyberinfrastructure resources for TeraShake funded by NSF • Southern California Earthquake Center is an NSF-funded geoscience research and development center

UCSD UNIVERSITY OF CALIFORNIA

Cyberinfrastructure and Data: Preserving our Scientific and Cultural Heritage

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD UNIVERSITY OF CALIFORNIA

Data Preservation

• • • Many Science, Cultural, and Official Collections must be sustained for the foreseeable future

Critical collections must be preserved:

community reference data collections

(e.g. Protein Data Bank) •

irreplaceable collections

(e.g. Shoah collection) •

longitudinal data

(e.g. PSID – Panel Study of Income Dynamics)

No plan for preservation often means that data is lost or damaged

“….

the progress of science and useful arts … depends on the reliable preservation of knowledge and information for generations to come.”

“Preserving Our Digital Heritage”, Library of Congress SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD UNIVERSITY OF CALIFORNIA

Key Challenges for Digital Preservation

What should we preserve?

• What materials must be “rescued”?

• How to plan for preservation of materials by design?

How should we preserve it?

• Formats • Storage media • Stewardship – who is responsible?

Who should pay for preservation?

• The content generators?

• The government?

• The users?

Who should have access?

Print media provides easy access for long periods of time but is hard to data-mine Digital media is easier to data-mine but requires management of evolution of media and resource planning over time

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD UNIVERSITY OF CALIFORNIA

Planning Ahead for Preservation

• • • •

Comprehensive approach to infrastructure for long-term preservation requires the integration of Collection ingestion Access and Services Research and development

for new functionality and adaptation to evolving technologies •

Business model, data policies, and management

issues critical to success of the infrastructure

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

Services Policy

UCSD UNIVERSITY OF CALIFORNIA

Ingestion R&D

Consortium

Cyberinfrastructure Resources at SDSC

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD UNIVERSITY OF CALIFORNIA

• •

SDSC Data Central

First program of its kind to support research and community data collections and databases

Comprehensive resources

Disk:

400 TB accessible via HPC systems, Web, SRB, GridFTP • • •

Databases:

DB2, Oracle, MySQL

SRB:

Collection management

Tape:

6 PB, accessible via file system, HPSS, Web, SRB, GridFTP

Data collection and database hosting

• • • Batch oriented access Collection management services

Collaboration opportunities:

• Long-term preservation • Data technologies and tools

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

• • • • • • • • • • •

New Allocated Data Collections include

Bee Behavior (Behavioral Science) C5 Landscape DB (Art) Molecular Recognition Database (Pharmaceutical Sciences) LIDAR (Geoscience) LUSciD (Astronomy) NEXRAD-IOWA (Earth Science) AMANDA (Physics) SIO_Explorer (Oceanography) Tsunami and Landsat Data (Earthquake Engineering) UC Merced Library Japanese Art Collection (Art) Terabridge (Structural Engineering)

[email protected]

UCSD UNIVERSITY OF CALIFORNIA

SDSC Academic Associates Program Targets Enabling Cyberinfrastructure Collaborations

• • • • • • •

SDSC/UC Academic Associates Program Cyberinfrastructure and “Seeding” Activities

Targeted workshops Priority SW installation

and

support Priority participation

Summer Institute for Cyberinfrastructure

Focused assistance

with developing successful proposals for national allocation programs

Targeted user services

Special

UC compute

and

data allocations Priority for “early usage”

resources of new national • • •

SDSC Cyberinfrastructure Resources Heavily Used by UC faculty and students

UC PIs account for

329+ trillion bytes of data

stored at SDSC In FY05, over

5 million CPU hours

on HPC machines at SDSC were used by UC faculty and students at all campuses UCSD faculty make up

40% of among top users

of SDSC compute resources

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD UNIVERSITY OF CALIFORNIA

Cyberinfrastructure is Fundamental for California

• • •

Cyberinfrastructure captures the practice and potential of modern science and engineering Cyberinfrastructure is the focus of increasing number of federal programs

• NSF (all directorates), NIH (BISTI, Bioinformatics, Computational Biology, etc.), DOE (Science Grid), etc.

Cyberinfrastructure is critical for success in modern research and education initiatives

• Stem cell research • Grid computing • Multi-disciplinary science and engineering

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

Leadership in Cyberinfrastructure provides a competitive edge to California researchers, educators, practitioners, and business leaders

UCSD UNIVERSITY OF CALIFORNIA

Thank You

[email protected]

www.sdsc.edu

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD UNIVERSITY OF CALIFORNIA