The Grid: The Future of HEP Computing?

Download Report

Transcript The Grid: The Future of HEP Computing?

LHC Scale Physics in 2008:
Grids, Networks and Petabytes
Shawn McKee ([email protected])
May 18th, 2005
Pan-American Advanced Studies Institute (PASI)
Mendoza, Argentina
Acknowledgements
• Much of this talk was constructed from
various sources. I would like acknowledge:
–
–
–
–
–
–The Globus Team
Rob Gardner (U Chicago)
Harvey Newman (Caltech)
– The ATLAS Collaboration
Paul Avery (U Florida)
– Trillium
Ian Foster (U Chicago/ANL)
Alan Wilson (Michigan)
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
2
Outline
• Large Datasets in High Energy Physics
– Overview of High Energy Physics and the LHC
– The ATLAS Experiment’s Data Model
• Managing LHC Scale Data
– Grids and Networks Computing Model
– Current Planning, Tools, Middleware and Projects
•
•
•
•
LHC Scale Physics in 2008
Grids and Networks at Michigan
Virtual Data
The Future of Data Intensive Science
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
3
Large Datasets in High Energy Physics
Introduction to High-Energy Physics
• Before I can talk in detail about large datasets I
want to provide a quick context for you to
understand where all this data comes from.
• High Energy physics explores the very small
constituents of nature by colliding “high energy”
particles and reconstructing the zoo of particles
which result.
• One of the most intriguing issues in High Energy
physics we are trying to address is the origin of
mass…
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
5
Physics with ATLAS:
The Higgs Particle
• The Riddle of Mass
• One of the main goals of the ATLAS program is to
discover and study the Higgs particle. The Higgs
particle is of critical importance in particle
theories and is directly related to the concept of
particle mass and therefore to all masses.
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
6
High-Energy: From an Electron-Volt
to Trillions of Electron-Volts
• Energies are often expressed in units of "electron-volts". An electronvolt (eV) is the energy acquired by a electron (or any particle with the
same charge) when it is accelerated by a potential difference of 1 volt.
• Typical energies involved in atomic processes (processes such as
chemical reactions or the emission of light) are of order a few eV. That
is why batteries typically produce about 1 volt, and have to be
connected in series to get much larger potentials.
• Energies in nuclear processes (like nuclear fission or radioactive
decay) are typically of order one million electron-volts (1 MeV).
• The highest energy accelerator now operating (at Fermilab) accelerates
protons to 1 million million electron volts (1 TeV =1012 eV).
• The Large Hadron Collider (LHC) at CERN will accelerate each of
two counter-rotating beams of protons to 7 TeV per proton.
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
7
What is an Event?
ATLAS will measure the
collisions of 7 TeV protons.
Each time protons collide or
single particles decay is called
an “event”
• In the ATLAS detector there will be about a billion
collision events per second, a data rate equivalent
to twenty simultaneous telephone conversations
by every person on the earth.
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
8
How Many Collisions?
• If two bunches of protons meet head on, the number of collisions from
zero upwards. How often are there actually collisions?
– For a fixed bunch size, this depends on how many protons there are in
each bunch, and how large each proton is.
• A proton can be roughly thought of as being about 10-15 meter in
radius. If you had bunches 10-6 meters in radius, and only, say, 10
protons in each bunch, the chance of even one proton-proton collision
when two bunches met would be extremely small.
• If each bunch had a billion-billion (1018) protons so that its entire cross
section were just filled with protons, every proton from one bunch
would collide with one from the other bunch, and you would have a
billion-billion collisions per bunch crossing.
• The LHC situation is in between these two extremes, a few collisions
(up to 20) per bunch crossing, which requires about a billion protons
in each bunch.
As you will see, this leads to a lot of data to sift through.
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
9
The Large Hadron Collider (LHC)
CERN, Geneva: 2007 Start
27 km Tunnel in Switzerland & France
CMS
TOTEM
Atlas
First Beams:
April 2007
Physics Runs:
from Summer 2007
May 18, 2005
pp, general
purpose; HI
pp, general
purpose; HI
ALICE : HI
LHCb: B-physics
Shawn McKee - PASI - Mendoza, Argentina
10
Data Comparison: LHC vs Prior Exp.
High Level-1 Trigger
(1 MHz)
Level 1 Rate
(Hz)
106
LHCB
105
Hans Hoffman
104
ATLAS
CMS
HERA-B
KLOE
TeV II
DOE/NSF
Review, Nov 00
High No. Channels
High Bandwidth
(500 Gbit/s)
High Data Archive
(PetaBytes)
CDF/D0
103
H1
ZEUS
NA49
UA1
102
104
105
LEP
May 18, 2005
ALICE
106
107
Event Size (bytes)
Shawn McKee - PASI - Mendoza, Argentina
11
The ATLAS Experiment
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
12
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
13
ATLAS
• A Torroidal LHC ApparatuS
• Collaboration
– 150 institutes
– 1850 physicists
• Detector
–
–
–
–
Inner tracker
Calorimeter
Magnet
Muon
• United States ATLAS
– 29 universities, 3 national labs
– 20% of ATLAS
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
14
Data Flow from ATLAS
ATLAS: 10 PB/y
(simulated + raw+sum)
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
15
LHC Timeline for Service
Challenges
Apr05 – SC2 Complete
June05 - Technical Design Report
Jul05 – SC3 Throughput Test
We are here …
not much time
to get things
ready!
Sep05 - SC3 Service Phase
Dec05 – Tier-1 Network operational
Apr06 – SC4 Throughput Test
May06 –SC4 Service Phase starts
Sep06 – Initial LHC Service in stable operation
Apr07 – LHC Service commissioned
2005
2006
SC2
SC3
2007
cosmics
preparation
setup
service
SC4
LHC Service Operation
May 18, 2005
2008
First physics
First beams
Full physics run
Shawn McKee - PASI - Mendoza, Argentina
16
Managing LHC Scale Data
The Data Challenge for LHC
• There is a very real challenge to managing 10’s of
Petabytes of data yearly for a globally distributed
collaboration of 2000 physicists!
• While much of the interesting data we seek is small
in volume we must understand and sort through a
huge volume of relatively uninteresting “events” to
discover new physics.
• The primary (only!) plan for LHC is to utilize Grid
Middleware and high performance networks to
harness the complete global resources of our
collaborations to manage this data analysis challenge
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
18
Managing LHC Scale Data
Grids and Networks Computing Model
The Problem
Petabytes…
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
20
The Solution
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
21
What is “The Grid”?
• There are many answers and interpretations
• The term was originally coined in the mid1990’s (in analogy with the power grid) and
can be described thusly:
“The grid provides flexible, secure, coordinated
resource sharing among dynamic collections of
individuals, institutions and resources (virtual
organizations:VOs)”
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
22
Grid Perspectives
• Users Viewpoint:
– A virtual computer which minimizes time to completion
for my application while transparently managing access
to inputs and resources
• Programmers Viewpoint:
– A toolkit of applications and API’s which provide
transparent access to distributed resources
• Administrators Viewpoint:
– An environment to monitor, manage and secure access
to geographically distributed computers, storage and
networks.
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
23
Data Grids for High Energy Physics
CERN/Outside Resource Ratio ~1:4
Tier0/( Tier1)/( Tier2)
~1:2:2
~PByte/sec
Online System
Tier 1
Offline Farm,
CERN Computer Ctr
~25 TIPS
Tier 0 +1
10-40 Gbits/sec
France
Italy
UK
Tier 2
Tier 3
~100-400 MBytes/sec
~10+ Gbps
Institute Institute
~0.25TIPS
Physics data cache
Workstations
Institute
Institute
100 - 10000
Mbits/sec
Tier 4
BNL Center
Tier2 Center
Tier2 Center
Tier2 Center
Tier2 Center
Tier2 Center
Physicists work on analysis
“channels”
Each institute has ~10 physicists
working on one or more channels
ATLAS version from Harvey Newman’s original
Shawn McKee - PASI - Mendoza, Argentina
May 18, 2005
24
Managing LHC Scale Data
Current Planning, Tools, Middleware and Testbeds
Grids and Networks: Why Now?
• Moore’s law improvements in computing
produce highly functional end systems
• The Internet and burgeoning wired and
wireless provide ~universal connectivity
• Changing modes of working and problem
solving emphasize teamwork, computation
• Network exponentials produce dramatic
changes in geometry and geography
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
26
Living in an Exponential World
(1) Computing & Sensors
Moore’s Law: transistor count doubles each ~18 months
Magnetohydrodynamics
star formation
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
27
Living in an Exponential World:
(2) Storage
• Storage density doubles every ~12 months
• This led to a dramatic growth in HEP online data
(1 petabyte = 1000 terabyte = 1,000,000 gigabyte)
–
–
–
–
2000
2005
2010
2015
~0.5 petabyte
~10 petabytes
~100 petabytes
~1000 petabytes
• Its transforming entire disciplines in physical and,
increasingly, biological sciences; humanities next?
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
28
Network Exponentials
• Network vs. computer performance
– Computer speed doubles every 18 months
– Network speed doubles every 9 months
– Difference = order of magnitude per 5 years
• 1986 to 2000
– Computers: x 500
– Networks: x 340,000
• 2001 to 2010
– Computers: x 60
– Networks: x 4000
Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific
American (Jan-2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
29
The Network
• As can be seen in the previous transparency, it can be
argued it is the evolution of the network which has been
the primary motivator for the Grid.
• Ubiquitous, dependable worldwide networks have
opened up the possibility of tying together
geographically distributed resources
• The success of the WWW for sharing information has
spawned a push for a system to share resources
• The network has become the “virtual bus” of a
virtual computer.
• More on this later…
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
30
What Is Needed for LHC-HEP?
• We require a number of high level capabilities to do HighEnergy Physics:
– Data Processing: All data needs to be reconstructed, first into
fundamental components like tracks and energy deposition and
then into “physics” objects like electrons, muons, hadrons,
neutrinos, etc.
• Raw -> Reconstructed ->Summarized
• Simulation, same path. Critical to understanding our detectors and
the underlying physics.
– Data Discovery: We must be able to locate events of interest
– Data Movement: We must be able to move discovered data as
needed for analysis or reprocessing
– Data Analysis: We must be able to apply our analysis to the data to
determine if
– Collaborative Tools: Vital to maintain our global collaborations
– Policy and Resource Management: Allow resource owners to
specify conditions under which they will share and allow them to
manage those resources as they evolve
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
31
Monitoring Example on OSG-ITB
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
32
Collaborative Tools Example: EVO
Managing LHC Scale Data
HEP Related Grid/Network Projects
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
35
The Evolution of Data Movement
• The recent history of data movement
capabilities exemplifies the evolution of
network capacity.
• NSFNet started with a 56Kbit modem link
as the US network backbone
• Current networks are so fast that end
systems are only able to fully drive them
when storage clusters are used at each end
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
36
NSFNET 56 Kb/s Site
Architecture
Bandwidth in terms of burst data transfer
and user wait time.
VAX
Fuzzball
Across the room
256 s (4 min)
1024MB
4 MB/s
May 18, 2005
Across the country
1024 s (17 min) 150,000 s (41 hrs)
1 MB/s
.007 MB/s
Shawn McKee - PASI - Mendoza, Argentina
37
2002 Cluster-WAN Architecture
OC-48 Cloud
OC-12
n x GbE (small n)
Across the room
Across the country
2000 s (33 min)
1 TB
0.5 GB/s
May 18, 2005
13k s (3.6h)
78 MB/s
Shawn McKee - PASI - Mendoza, Argentina
38
Distributed Terascale Cluster
Interconnect
OC-192
Big Fast
Interconnect
n x GbE (large n)
2000 s (33 min)
10 TB
5 GB/s* (Wire speed limit…not yet achieved)
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
10 TB
39
UltraLight Goal (Near Future)
• A more modest goal in terms of bandwidth achieved is
being targeted by the UltraLight collaboration.
• Build, tune and deploy moderately priced servers capable
of delivering 1 GB/s between 2 such servers over the WAN
• Provides the ability to utilize the full capability of
lambda’s, as available, without requiring 10-100’s of nodes
at each end.
– Easier to manage, coordinate and deploy a smaller number of
performant servers than a much larger number of less capable ones
• Easier to scale-up as needed to match the available
bandwidth
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
40
What is UltraLight?
• UltraLight is a program to explore the integration of cutting-edge
network technology with the grid computing and data infrastructure of
HEP/Astronomy
• The program intends to explore network configurations from common
shared infrastructure (current IP networks) thru dedicated optical paths
point-to-point.
• A critical aspect of UltraLight is its integration with two driving
application domains in support of their national and international
eScience collaborations: LHC-HEP and eVLBI-Astronomy
• The Collaboration includes:
–
–
–
–
–
Caltech
Florida Int. Univ.
MIT
Univ. of Florida
Univ. of Michigan
May 18, 2005
― UC Riverside
― BNL
― FNAL
― SLAC
― UCAID/Internet2
Shawn McKee - PASI - Mendoza, Argentina
41
UltraLight Network: PHASE I
• Implementation
via “sharing” with
HOPI/NLR
• MIT not yet
“optically”
coupled
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
42
UltraLight Network: PHASE III
By 2008
• Move into production
– Terabyte datasets in
10 minutes
• Optical switching
fully enabled amongst
primary sites
• Integrated
international
infrastructure
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
43
LHC Scale Physics in 2008
ATLAS Discovery Potential for
SM Higgs Boson
• Good sensitivity over the
full mass range from ~100
GeV to ~ 1 TeV
S
B
• For most of the mass range
at least two channels available
• Detector performance is
crucial: b-tag, leptons, g, E
resolution, g / jet separation, ...
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
45
H  gg
ATLAS
H  ZZ *  e  e    
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
46
Data Intensive
Computing and Grids
• The term “Data Grid” is often used
– Unfortunate as it implies a distinct infrastructure,
which it isn’t; but easy to say
• Data-intensive computing shares numerous
requirements with collaboration,
instrumentation, computation, …
– Security, resource mgt, info services, etc.
• Important to exploit commonalities as very
unlikely that multiple infrastructures can be
maintained
• Fortunately this seems easy to do!
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
47
A Model Architecture for Data Grids
Metadata
Catalog
Attribute
Specification
Application
Logical Collection and
Logical File Name
Selected
Replica
May 18, 2005
Replica
Selection
MDS
NWS
Disk Cache
Tape Library
Disk Array
Replica Location 1
Multiple Locations
Performance
Information &
Predictions
GridFTP Control Channel
GridFTP
Data
Channel
Replica
Catalog
Disk Cache
Replica Location 2
Replica Location 3
Shawn McKee - PASI - Mendoza, Argentina
48
Examples of
Desired Data Grid Functionality
•
•
•
•
•
•
•
High-speed, reliable access to remote data
Automated discovery of “best” copy of data
Manage replication to improve performance
Co-schedule compute, storage, network
“Transparency” wrt delivered performance
Enforce access control on data
Allow representation of “global” resource
allocation policies
• Not there yet! Back to the physics…
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
49
Needles in LARGE Haystacks
• When protons collide, some events are "interesting" and
may tell us about exciting new particles or forces, whereas
many others are "ordinary" collisions (often called
"background"). The ratio of their relative rates is about 1
interesting event for 10 million background events. One
of our key needs is to separate the interesting events from
the ordinary ones.
• Furthermore the information must be sufficiently detailed
and precise to allow eventual recognition of certain
"events" that may only occur at the rate of one in one
million-million collisions (10-12), a very small fraction of
the recorded events, which are a very small fraction of all
events.
• I will outline the steps ATLAS takes in getting to these
interesting particles
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
50
HEP Data Analysis
• Raw data
– hits, pulse heights
• Reconstructed data (ESD)
– tracks, clusters…
• Analysis Objects (AOD)
– Physics Objects
– Summarized
– Organized by physics topic
• Ntuples, histograms,
statistical data
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
51
Production
Analysis
Physics Models
Trigger System
Run Conditions
Monte Carlo Truth Data
Data Acquisition
Detector Simulation
Level 3 trigger
Calibration Data
Raw data
Trigger Tags
Reconstruction
Reconstruction
Event Summary Data ESD
MC Raw Data
Event Tags
MC Event Summary Data
MC Event Tags
coordination required at the collaboration and group levels
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
52
Physics Analysis
ESD
ESD
ESD
ESD
ESD
Event Tags
Event Selection
Tier 0,1
Collaboration
wide
Calibration Data
Analysis
Objects
Analysis
Raw Data
Tier 2
Processing
Analysis
Groups
PhysicsObjects
PhysicsObjects
PhysicsObjects
StatObjects
StatObjects
StatObjects
Tier 3, 4
Physicists
May 18, 2005
Physics Analysis
Shawn McKee - PASI - Mendoza, Argentina
53
LHC pp Running: Data Sizes
Experiment
SIM
ALICE
RAW
Trigger
ESD
AOD
TAG
400KB 40KB
1MB
100Hz
200KB
50KB
10KB
ATLAS
2MB
500KB
1.6MB 200Hz
500KB
100KB
1KB
CMS
2MB
400KB
1.5MB 150Hz
250KB
50KB
10KB
400KB
25KB
75KB
25KB
1KB
LHCb
May 18, 2005
SIMESD
2KHz
Shawn McKee - PASI - Mendoza, Argentina
54
Data Flow Analysis by V. Lindenstruth
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
55
Data Estimates From LHC
Data sizes from
the LHC along
with some
estimates about
the tiered
resources
envisioned
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
56
Example of (Simulated) Data Sizes
• In advance of getting real data we have very
sophisticated simulation codes which attempt to
model collisions of particles and the
corresponding response of the ATLAS detector.
• These simulations are critical to understanding our
detector design and our analysis codes
• The next slide will show some information about
how much computer time each relevant step takes
and how much data is involved as an example of a
small research group’s requirements
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
57
Case Study: Simulating Some
ATLAS Physics Process
Running 1000 Z  μμ events (at Michigan)
Step
Storage
CPU Time
Generation
Simulation
36 MB
845 MB
Seconds
55 Hours
Digitization
1520 MB
9 Hours
15 MB
10 Hours
Reconstruction
This totals ~2.4 GB and 74 CPU hours on a 2 GHz P4 processor
with 1 GB of RAM. Unfortunately in this study we need
approximately 1 Million such events which means we must have
2.4 TB of storage and require 3000 CPU DAYS of processing time
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
58
Virtual (and Meta) Data
(A very important concept for LHC
Physics Infrastructure)
Programs as Community Resources:
Data Derivation and Provenance
• Most [scientific] data are not simple
“measurements”; essentially all are:
– Computationally corrected/reconstructed
– And/or produced by numerical simulation
• And thus, as data and computers become ever larger
and more expensive:
– Programs are significant community resources
– So are the executions of those programs
• Management of the transformations that map
between datasets an important problem
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
60
“I’ve come across some
interesting data, but I need
to understand the nature of
the corrections applied
when it was constructed
before I can trust it for my
purposes.”
Motivations (1)
Data
created-by
Transformation
execution-of
“I’ve detected a calibration
error in an instrument and
want to know which derived
data to recompute.”
consumed-by/
generated-by
Derivation
“I want to apply an jet
“I want to search an ATLAS event
analysis program to
database for events with certain
millions of events. If the
characteristics. If a program that
results already exist, I’ll
performs this analysis exists, I
save weeks of
won’t have to write one from
Shawn McKee - PASI - Mendoza, Argentina computation.”
61
May 18, 2005
scratch.”
Motivations (2)
• Data track-ability and result audit-ability
– Universally sought by GriPhyN applications
• Repair and correction of data
– Rebuild data products—c.f., “make”
• Workflow management
– A new, structured paradigm for organizing, locating,
specifying, and requesting data products
• Performance optimizations
– Ability to re-create data rather than move it
• And others, some we haven’t thought of
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
62
Virtual Data in
Action
• Data request may
–
–
–
–
Major facilities, archives
Compute locally
Compute remotely
Access local data
Access remote data
• Scheduling based on
•
– Local policies
– Global policies
– Cost
More on this later
May 18, 2005
Regional facilities, caches
Fetch item
Local facilities, caches
Shawn McKee - PASI - Mendoza, Argentina
63
Chimera Application:
Sloan Digital Sky Survey Analysis
Size distribution of
galaxy clusters?
Galaxy cluster
size distribution
100000
Chimera Virtual Data System
+ iVDGL Data Grid (many CPUs)
10000
1000
100
10
1
1
May 18, 2005
10
Number of Galaxies
100
Shawn McKee - PASI - Mendoza, Argentina
64
Virtual Data Queries
• A query for events implies:
– Really means asking if a input data sample corresponding to a
set of calibrations, methods, and perhaps Monte Carlo history
match a set of criteria
• It is vital to know, for example:
– What data sets already exist, and in which formats? (ESD,
AOD,Physics Objects) If not, can it be materialized?
– Was this data calibrated optimally?
– If I want to recalibrate a detector, what is required?
• Methods:
– Virtual data catalogs and APIs
– Data signatures
• Interface to Event Selector Service
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
65
Virtual Data Scenario
• A physicist issues a query for events
– Issues:
• How expressive is this query?
• What is the nature of the query?
• What language (syntax) will be supported for the query?
– Algorithms are already available in local shared libraries
– For ATLAS, an Athena service consults an ATLAS Virtual Data
Catalog or Registry Service
• Three possibilities
– File exists on local machine
• Analyze it
– File exists in a remote store
• Copy the file, then analyze it
– File does not exists
• Generate, reconstruct, analyze; possibly done remotely, then copied
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
66
Virtual Data Summary
• The concept of virtual data is an important
one for the LHC computing
• Having the ability to either utilize a local
copy, move a remote copy or regenerate the
dataset (locally or remotely) is very
powerful in helping to optimize the overall
infrastructure supporting LHC physics.
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
67
The Future of Data-Intensive
e-Science…
Distributed Computing
Problem Evolution
• Past-present: O(102) high-end systems; Mb/s networks;
centralized (or entirely local) control
– I-WAY (1995): 17 sites, week-long; 155 Mb/s
– GUSTO (1998): 80 sites, long-term experiment
– NASA IPG, NSF NTG: O(10) sites, production
• Present: O(104-106) data systems, computers; Gb/s
networks; scaling, decentralized control
– Scalable resource discovery; restricted delegation; community
policy; Data Grid: 100s of sites, O(104) computers; complex
policies
• Future: O(106-109) data, sensors, computers; Tb/s
networks; highly flexible policy, control
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
69
A “Grid” (Globus) View of the Future:
All Software is Network-Centric
• We don’t build or buy “computers” anymore, we
borrow or lease required resources
– When I walk into a room, need to solve a problem, need to
communicate
• A “computer” is a dynamically, often collaboratively
constructed collection of processors, data sources,
sensors, networks
– Similar observations apply for software
• Pervasive, extremely high-performance networks
provide location independent access to huge datasets
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
70
Major Issues for Grids and eScience
• The vision outlined in the previous slide assumes a
level of capability way beyond current grid
technology:
– Current grids allow access to distributed resources in a
secure (authenticated/authorized) way
– However, the grid users are faced with a very limited
and detached view of their “virtual computer”
• Current grid technology and middleware requires
the next level of integration and functionality to
deliver an effective system for e-Science…
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
71
The Needed Grid Enhancements
• We need to provide users with the SAME type of
capabilities which exist on their local workstation and
operating systems:
–
–
–
–
–
–
–
–
File “browsing”
Task debugging
System monitoring
Process prioritization and management
Accounting and auditing
Fine grained access control
Storage access and management
Error Handling/Resilency
• The network has become the virtual bus on our grid
virtual computer…we now need the equivalent of a “grid
operating system” to enable easy transparent to our
virtual machine
• This is difficult but very necessary…
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
72
Future of the Grid for LHC?
• Grid Optimist
– Best thing since the WWW. Don’t worry, the grid will solve
all our computational and data problems! Just click “Install”
• Grid Pessimist
– The grid is “merely an excuse by computer scientists to milk
the political system for more research grants so they can write
yet more lines of useless code” [The Economist, June 21,
2001]
– “A distraction from getting real science done” [McCubbin]
• Grid Realist
– The grid can solve our problems, because we design it to! We
must work closely with the developers as it evolves, providing
our requirements and testing their deliverables in our
environment.
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
73
Conclusions
• We have a significant amount of data to manage for LHC
• Networks are a central component in future e-Science.
• LHC Physics will depend heavily on globally distributed
resources => the NETWORK is critical!
• There are many very interesting projects and concepts in
Grids and Networks working toward dealing with the
massive amounts of distributed data we expect.
• We have a few more years to see how well it will all work!
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
74
For More Information…
• The ATLAS Project
– atlas.web.cern.ch/Atlas/
• Grid Forum
– www.gridforum.org
• HENP Internet2 SIG
– henp.internet2.edu
• OSG
– www.opensciencegrid.org/
• Questions?
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
75
Questions?
May 18, 2005
Shawn McKee - PASI - Mendoza, Argentina
76