EGEE and the future of Grid Infrastructures

Download Report

Transcript EGEE and the future of Grid Infrastructures

Enabling Grids for E-sciencE
EGEE and the future of Grid
Infrastructures
International Symposium on Grid Computing 2007
Academia Sinica, Taipei
26-29 March 2007
Bob Jones
EGEE-II Project Director
CERN
www.eu-egee.org
INFSO-RI-508833
eScience
Enabling Grids for E-sciencE
• Science is becoming increasingly digital, needs to deal with
increasing amounts of data and computational needs
• Simulations get ever more detailed
– Nanotechnology – design of new materials from
the molecular scale
– Modelling and predicting complex systems
(weather forecasting, river floods, earthquake)
– Decoding the human genome
• Experimental Science uses ever more
sophisticated sensors to make precise
measurements
 Need high statistics
 Huge amounts of data
 Serves user communities around the world
INFSO-RI-508833
ISGC2007
2
High Energy Physics
Enabling Grids for E-sciencE
Large Hadron Collider (LHC):
• One of the most powerful instruments ever
built to investigate matter
• 40 Million Particle collisions per second
• 4 Experiments: ALICE, ATLAS, CMS, LHCb
• ~15 PetaBytes/year from the 4 experiments
• First beams in 2007
HEP track today
Mont Blanc
(4810 m)
Downtown Geneva
INFSO-RI-508833
ISGC2007
3
In silico drug discovery
Enabling Grids for E-sciencE
• Diseases such as HIV/AIDS, SRAS, Bird Flu etc. are a threat to public
health due to world wide exchanges and circulation of persons
• Grids open new perspectives to in silico drug discovery
– Reduced cost and adding an accelerating factor in the search for new drugs
International collaboration
is required for:
• Early detection
• Epidemiological watch
• Prevention
• Search for new drugs
• Search for vaccines
•Avian influenza:
•bird casualties
presentation by Ying-Ta WU & Hurng-Chun LEE
in life sciences track on Wednesday
INFSO-RI-508833
ISGC2007
4
WISDOM
Enabling Grids for E-sciencE
http://wisdom.healthgrid.org/
Mini Workshop on Thursday
INFSO-RI-508833
ISGC2007
5
Medical image processing: analysing tumours
Enabling Grids for E-sciencE
• Pharmacokinetics: contrast agent diffusion study
– co-registration of a time series of volumetric medical images to analyse the
evolution of the diffusion of contrast agents
• Computational Costs
– 20 Patients: 2623 hours (Co-registration + Parametric Image)
– Using a 20-processor Computing Farm: 146 hours
– Using the Grid: <20 hours
Sequential
HPC
Grid
INFSO-RI-508833
If you have enough resources
20x12=240 computers, EGEE
has >30,000
ISGC2007
6
Enabling Grids for E-sciencE
Example: Determining
earthquake mechanisms
• Seismic software application determines epicentre,
magnitude, mechanism
• Analysis of Indonesian earthquake
(28 March 2005)
– Seismic data within 12 hours after the earthquake
– Analysis performed within 30 hours after earthquake occurred
– Results
 Not an aftershock of December 2004 earthquake
 Different location (different part of fault line further south)
 Different mechanism
 Rapid analysis of earthquakes important for relief efforts
Earth Science & Astronomy track today
Peru, June 23, 2001
Mw=8.4
INFSO-RI-508833
Sumatra, March 28, 2005
Mw=8.5
ISGC2007
7
Bioinformatics
Enabling Grids for E-sciencE
GPS@: bioinformatics portal
– http://gpsa.ibcp.fr/ web portal
– Access up-to-date sequence
and 3D-structure databanks
(EMBL, GenBank, SWISSPROT etc.)
– Tens of bioinformatics legacy
code
• Convenient easy-to-use
interface with access to
well-known databanks
• Uses grid resources to
analyse the sequences
INFSO-RI-508833
ISGC2007
8
Data, Data, Data
Enabling Grids for E-sciencE
Slide by Carole Gobel
INFSO-RI-508833
ISGC2007
9
Main trend
Enabling Grids for E-sciencE
The size of data an organization owns, manages,
and depends on is dramatically increasing:
–
–
–
–
Ownership cost of storage capacity goes down
Data generated and consumed goes up
Network capacity goes up
Distributed computing technology matures and is
more widely adopted
INFSO-RI-508833
ISGC2007
10
How e-Infrastructrures help e-Science
Enabling Grids for E-sciencE
•
e-Infrastructures provide easier access for
– Small research groups
– Scientists from many different fields
– Remote and still developing countries
•
To new technologies
– Produce and store massive amounts
of data
– Transparent access to millions of files
across different administrative domains
– Low cost access to resources
 Mobilise large amounts of CPU & storage
on short notice (PC clusters)
– High-end facilities (supercomputers)
•
And help to find new ways to collaborate
– Develops applications using distributed
complex workflows
– Eases distributed collaborations
– Provides new ways of community building
– Gives easier access to higher education
INFSO-RI-508833
ISGC2007
11
EGEE
Enabling Grids for E-sciencE
Flagship grid infrastructure project co-funded by the European Commission
Now in 2nd phase with 91 partners in 32 countries
Objectives
• Large-scale, production-quality
grid infrastructure for e-Science
• Attracting new resources and
users from industry as well as
science
• Maintain and further improve
gLite Grid middleware
INFSO-RI-508833
ISGC2007
12
Applications on EGEE
Enabling Grids for E-sciencE
• Multitude of applications from a growing
number of domains
–
–
–
–
–
–
–
–
–
–
–
Astrophysics
Computational Chemistry
Earth Sciences
Keynote by
Financial Simulation Luigi Fusco
Wednesday
Fusion
Geophysics
High Energy Physics
Life Sciences
Multimedia
Material Sciences
…..
Book of abstracts: http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-005.pdf
INFSO-RI-508833
ISGC2007
13
Production Usage Status
Enabling Grids for E-sciencE
250
No. Sites
200
150
100
50
Ap
r-
04
Ju
n0
Au 4
g04
O
ct
-0
4
D
ec
-0
4
Fe
b05
Ap
r05
Ju
n05
Au
g05
O
ct
-0
5
D
ec
-0
5
Fe
b06
Ap
r06
Ju
n0
Au 6
g06
O
ct
-0
6
D
ec
-0
6
0
~17.5 million jobs run (6450 cpu-years) in 2006;
Workloads of the “not HEP VOs” is now significant – approaching 810K jobs per day; and 1000 cpu-months/month
• one year ago this was the overall scale of work for all VOs
40000
35000
No. CPU
30000
25000
20000
15000
10000
5000
Grid operations & management track on Thursday
Ap
r-
04
Ju
n0
Au 4
g04
O
ct
-0
4
D
ec
-0
4
Fe
b05
Ap
r05
Ju
n0
Au 5
g05
O
ct
-0
5
D
ec
-0
5
Fe
b06
Ap
r06
Ju
n0
Au 6
g06
O
ct
-0
6
D
ec
-0
6
0
INFSO-RI-508833
ISGC2007
14
EGEE Middleware Distribution
Enabling Grids for E-sciencE
• gLite
– Exploit experience and existing
components from VDT (Condor,
Globus), EDG/LCG, and others
– Develop a lightweight stack of
generic middleware useful to EGEE applications (HEP and Life
Sciences are pilot applications)
 Pluggable components – cater for different implementations
 Follow SOA approach, WS-I compliant where possible
– Focus is on re-engineering and hardening
– Business friendly open source license
 Moving to Apache-2
Tutorial held yesterday
INFSO-RI-508833
ISGC2007
15
Grid Middleware
Enabling Grids for E-sciencE
Applications
Higher-Level Grid Services
Workload Management
Replica Management
Visualization
Workflow
Grid Economies
...
Foundation Grid Middleware
Security model and infrastructure
Computing (CE) and Storage Elements (SE)
Accounting
Information and Monitoring
INFSO-RI-508833
• Applications have access
both to Higher-level Grid
Services and to Foundation
Grid Middleware
• Higher-Level Grid Services
are supposed to help the
users building their
computing infrastructure but
should not be mandatory
• Foundation Grid Middleware
will be deployed on the EGEE
infrastructure
– Must be complete and robust
– Should allow interoperation
with other major grid
infrastructures
– Should not assume the use of
Higher-Level Grid Services
ISGC2007
16
gLite Grid Middleware Services
Enabling Grids for E-sciencE
Access
CLI
Security
API
Information & Monitoring
Authorization
Information &
Monitoring
Auditing
Authentication
Data Management
Application
Monitoring
Workload Management
Metadata
Catalog
File & Replica
Catalog
Accounting
Job
Provenance
Package
Manager
Storage
Element
Data
Movement
Site Proxy
Computing
Element
Workload
Management
Overview paper http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf
INFSO-RI-508833
ISGC2007
17
Grid of Grids - from Local to
Global
National
Campus
ISGC2007
Community
18
OSG sites
Keynote by Ruth Pordes on Wednesday
ISGC2007
19
32 Virtual Organizations participating Groups
3 with >1000 jobs max.
(all particle physics)
3 with 500-1000 max.
(all outside physics)
5 with 100-500 max
(particle, nuclear, and astro
physics)
ISGC2007
20
The DEISA supercomputing environment
(21.900 processors and 145 Tf in 2006, more than 190 Tf in 2007)
•
IBM AIX Super-cluster
–
FZJ-Julich, 1312 processors, 8,9 teraflops peak
–
RZG – Garching, 748 processors, 3,8 teraflops peak
–
IDRIS, 1024 processors, 6.7 teraflops peak
–
CINECA, 512 processors, 2,6 teraflops peak
–
CSC, 512 processors, 2,6 teraflops peak
–
ECMWF, 2 systems of 2276 processors each, 33 teraflops peak
–
HPCx, 1600 processors, 12 teraflops peak
•
BSC, IBM PowerPC Linux system (MareNostrum) 4864 processeurs, 40
teraflops peak
•
SARA, SGI ALTIX Linux system, 416 processors, 2,2 teraflops peak
•
LRZ, Linux cluster (2.7 teraflops) moving to SGI ALTIX system (5120
processors and 33 teraflops peak in 2006, 70 teraflops peak in 2007)
•
HLRS, NEC SX8 vector system, 646 processors, 12,7 teraflops peak.
•
Systems interconnected with dedicated 1Gb/s network – currently
upgrading to 10 Gb/s – provided by GEANT and NRENs
March 2007EGEE Workshop on Management of Rights in Production
Grids
Paris, June 19th, 2006
ISGC2007V.
Alessandrini
IDRIS-CNRS
21
National Research Grid Infrastructure
(NAREGI) 2003-2007
• Petascale Grid Infrastructure R&D for Future Deployment
– $45 mil (US) + $16 mil x 5 (2003-2007) = $125 mil total
– Hosted by National Institute of Informatics (NII) and Institute of Molecular
Science (IMS)
Keynote by Satoshi Matsuoka on Thursday
– PL: Ken Miura (FujitsuNII)
• Sekiguchi(AIST), Matsuoka(Titech), Shimojo(Osaka-U), Aoyagi (Kyushu-U)…
– Participation by multiple (>= 3) vendors, Fujitsu, NEC, Hitachi, NTT, etc.
– Follow and contribute to GGF Standardization, esp. OGSA
Focused
“Grand
Challenge”
Grid Apps
Areas
Nanotech
Grid Apps
“NanoGrid”
IMS ~10TF
(Biotech
Grid Apps)
(BioGrid
RIKEN)
Grid and Network
Management
Grid Middleware
(Other
Apps)
Other
Inst.
NEC
Osaka-U
Titech
National Research
Grid Middleware R&D
AIST
Grid R&D Infrastr.
15 TF-100TF
Fujitsu
U-Tokyo
March 2007
ISGC2007
SuperSINET
U-Kyushu
22
Hitachi
Interoperability
Enabling Grids for E-sciencE
• Interoperability between
e-Infrastructures is essential to
provide services to global user
communities
• “Grid-Interoperability-Now” group
within the OpenGridForum is
providing a good environment for
practical developments
G IN
Middleware & interoperability track on Wednesday & Thursday
• Experience shows this work is most
successful when it is driven by the
needs of user communities
INFSO-RI-508833
ISGC2007
23
Collaborating e-Infrastructures
Enabling Grids for E-sciencE
TWGRID
Potential for linking ~80 countries
INFSO-RI-508833
ISGC2007
24
Middleware Standards
Enabling Grids for E-sciencE
Slide by Dave Snelling
INFSO-RI-508833
ISGC2007
25
Middleware Concepts
Enabling Grids for E-sciencE
Slide by Dave Snelling
INFSO-RI-508833
ISGC2007
26
Enabling Grids for E-sciencE
Co-located with OGF 20
www.eu-egee.org
INFSO-RI-508833
The Future of Grids
Enabling Grids for E-sciencE
•
Increasing the number of infrastructure users by increasing awareness
– Dissemination and outreach
Education track on Wednesday
– Training and education
– Grids offer new opportunities for collaborative work
•
Increasing the number of applications by improving application support
and middleware functionality
– Increase stability, scalability, and usability
 Major efforts needed particularly on VO management, security infrastructure, data
management, and job management
– High level grid middleware extensions
•
Increasing the grid infrastructure
– Increase manageability of Grid services
– Reducing the cost of operation
– Ensuring interoperability between infrastructures
•
Protecting user investments
Industry & Government track today
– Better involvement of industry
– Move towards a sustainable grid infrastructure
INFSO-RI-508833
ISGC2007
28
Sustainability: Beyond EGEE-II
Enabling Grids for E-sciencE
• Need to prepare for permanent Grid infrastructure
– Ensure a reliable and adaptive support for all sciences
– Independent of short project funding cycles
– Infrastructure managed in collaboration
with national grid initiatives
Presentation by Dieter Kranzlmueller today
INFSO-RI-508833
ISGC2007
29
EGEE’07 Conference
Enabling Grids for E-sciencE
Building Bridges…
• Between Science and
business
• Between users and
infrastructures
• Between countries
• Between scientific
disciplines
• Between projects
• Etc
http://www.eu-egee.org/egee07
INFSO-RI-508833
ISGC2007
30
Summary
Enabling Grids for E-sciencE
• Grids are all about sharing – they are a means of working with
groups around the world
– Today we have a window of opportunity to move grids from research
prototypes to permanent production systems (as networks did a
few years ago)
• Interoperability is key to providing the level of support required for
our user communities
• EGEE operates the world’s largest multi-disciplinary grid
infrastructure for scientific research
– In constant and significant production use
• Need to prepare the long-term
– EGEE, collaborating projects, national grid initiatives and user
communities are working to define a model for a sustainable grid
infrastructure that is independent of short project cycles
www.eu-egee.org
INFSO-RI-508833
ISGC2007
31