International Networks and the US

Download Report

Transcript International Networks and the US

Application-Empowered Networks: for
Science and Global Virtual Organizations
Harvey B. Newman
California Institute of Technology
SC2003 Panel, Phoenix
November 20, 2003
The Challenges of Next Generation
Science in the Information Age
Petabytes of complex data explored and analyzed by
1000s of globally dispersed scientists, in hundreds of teams
 Flagship Applications
 High Energy & Nuclear Physics, AstroPhysics Sky Surveys:
TByte to PByte “block” transfers at 1-10+ Gbps
 eVLBI: Many real time data streams at 1-10 Gbps
 BioInformatics, Clinical Imaging: GByte images on demand
 HEP Data Example:
 From Petabytes in 2003, ~100 Petabytes by 2007-8,
to ~1 Exabyte by ~2013-5.
 Provide results with rapid turnaround, coordinating
large but LIMITED computing, data handling, NETWORK resources
 Advanced integrated applications, such as Data Grids,
rely on seamless operation of our LANs and WANs
 With reliable, quantifiable high performance
Large Hadron Collider (LHC)
CERN, Geneva: 2007 Start
 pp s =14 TeV L=1034 cm-2 s-1
 27 km Tunnel in Switzerland & France
CMS
TOTEM
pp, general
purpose; HI
6000 Physicists &
Engineers; 60 Countries
250 Institutions
First Beams:
April 2007
Physics Runs:
from Summer 2007
ALICE : HI
ATLAS
Atlas
LHCb: B-physics
LHC: Higgs Decay into 4 muons
(Tracker only); 1000X LEP Data Rate
(+30 minimum bias events)
All charged tracks with pt > 2 GeV
Reconstructed tracks with pt > 25 GeV
109 events/sec, selectivity: 1 in 1013 (1 person in a thousand world populations)
Application Empowerment
of Networks, and Global Systems
 Effective use of networks is vital for the existence
and daily operation of Global Collaborations
 Scientists face the greatest challenges in terms of
 Data intensiveness; volume and complexity
 Distributed computation and storage resources;
a wide range of facility sizes and network capability
 Global dispersion of many cooperating teams, large and small
 Application and computer scientists have become
leading co-developers of global networked systems
 Strong alliances with network engineers; leading vendors
 Building on a tradition of building next-generation systems
that harness new technologies in the service of science
 Mission Orientation
Tackle the hardest problems, to enable the science
Maintaining a years-long commitment
LHC Data Grid Hierarchy:
Developed at Caltech
CERN/Outside Resource Ratio ~1:2
Tier0/( Tier1)/( Tier2)
~1:1:1
~PByte/sec
~100-1500
MBytes/sec
Online System
Experiment
CERN Center
PBs of Disk;
Tape Robot
Tier 0 +1
Tier 1
~2.5-10 Gbps
IN2P3 Center
INFN Center
RAL Center
FNAL Center
2.5-10 Gbps
Tier 3
~2.5-10 Gbps
Tier 2
Institute Institute
Physics data cache
Workstations
Institute
Tier2 Center
Tier2 Center
Tier2 Center
Tier2 CenterTier2 Center
Institute
0.1 to 10 Gbps
Tens of Petabytes by 2007-8.
An Exabyte ~5-7 Years later.
Tier 4
Emerging Vision: A Richly Structured, Global Dynamic System
Fall 2003: Transatlantic Ultraspeed TCP Tranfers
Throughput Achieved: X50 in 2 years
Terabyte Transfers by the Caltech-CERN Team:
Nov 18: 4.00 Gbps IPv6 Geneva-Phoenix (11.5 kkm)
 Oct 15: 5.64 Gbps IPv4 Palexpo-L.A. (10.9 kkm)
 Across Abilene (Internet2) Chicago-LA,
Sharing with normal network traffic
 Peaceful Coexistence with a Joint Internet2Telecom World VRVS Videoconference
Nov 19: 23+ Gbps: Caltech, SLAC,
CERN, LANL, UvA, Manchester
10GigE NIC
European Commission
Juniper,
Level(3)
Telehouse
FAST TCP:




Baltimore/Sunnyvale
Fast convergence to equilibrium
RTT estimation: fine-grain timer
Delay monitoring in equilibrium
Pacing: reducing burstiness
Measurements 11/02
 Std Packet Size
 Utilization averaged
over > 1hr
 4000 km Path
9G
90%
10G
88%
90%
Average
utilization
92%
95%
1 flow
2 flows
7 flows
Fair Sharing
Fast Recovery
8.6 Gbps;
21.6 TB
in 6 Hours
9 flows
10 flows
HENP Major Links: Bandwidth
Roadmap (Scenario) in Gbps
Year
Production
Experimental
Remarks
SONET/SDH
2001
2002
0.155
0.622
0.622-2.5
2.5
2003
2.5
10
DWDM; 1 + 10 GigE
Integration
2005
10
2-4 X 10
 Switch;
 Provisioning
2007
2-4 X 10
~10 X 10;
40 Gbps
~5 X 40 or
~20-50 X 10
~25 X 40 or
~100 X 10
1st Gen.  Grids
SONET/SDH
DWDM; GigE Integ.
40 Gbps 
~10 X 10
Switching
or 1-2 X 40
2nd Gen  Grids
2011
~5 X 40 or
Terabit Networks
~20 X 10
~Fill One Fiber
2013
~T erabit
~MultiTbps
Continuing the Trend: ~1000 Times Bandwidth Growth Per Decade;
We are Rapidly Learning to Use Multi-Gbps Networks Dynamically
2009
“Private” Grids”: Structured P2P
Sub-Communities in Global HEP
HENP Lambda Grids:
Fibers for Physics
 Problem: Extract “Small” Data Subsets of 1 to 100 Terabytes
from 1 to 1000 Petabyte Data Stores
 Survivability of the HENP Global Grid System, with
hundreds of such transactions per day (circa 2007)
requires that each transaction be completed in a
relatively short time.
 Example: Take 800 secs to complete the transaction. Then
Transaction Size (TB)
Net Throughput (Gbps)
1
10
10
100
100
1000 (Capacity of
Fiber Today)
 Summary: Providing Switching of 10 Gbps wavelengths
within ~2-4 years; and Terabit Switching within 5-8 years
would enable “Petascale Grids with Terabyte transactions”,
to fully realize the discovery potential of major HENP programs,
as well as other data-intensive fields.
Next Generation Grid Challenges:
Workflow Management & Optimization
 Scaling to Handle Thousands of Simultaneous Requests
 Including the Network as a Dynamic, Managed Resource
 Co-Scheduled with Computing and Storage
 Maintaining a Global View of Resources and System State
 End-to-end Monitoring
 Adaptive Learning: New paradigms for optimization,
problem resolution
 Balancing Policy Against Moment-to-moment Capability
 High Levels of Usage of Limited Resources Versus
Better Turnaround Times for Priority Tasks
 Strategic Workflow Planning; Strategic Recovery
 An Integrated User Environment
 User-Grid Interactions; Progressive Automation
 Emerging Strategies and Guidelines
UltraLight
http://ultralight.caltech.edu
 Serving the major LHC experiments; developments
broadly applicable to other data-intensive programs
“Hybrid” packet-switched and circuit-switched,
dynamically managed optical network
 Global services for system management
 Trans-US wavelength riding on NLR: LA-SNV-CHI-JAX
 Leveraging advanced research & production networks
 USLIC/DataTAG, SURFnet/NLlight, UKLight,
Abilene, CA*net4
 Dark fiber to CIT, SLAC, FNAL, UMich; Florida Light Rail
 Intercont’l extensions: Rio de Janeiro, Tokyo, Taiwan
 Flagship Applications with a diverse traffic mix
 HENP: TByte to PByte “block” data transfers at 1-10+ Gbps
 eVLBI: Real time data streams at 1 to several Gbps
Monitoring CMS farms and WAN traffic
UltraLight Collaboration:
http://ultralight.caltech.edu
 Caltech, UF, FIU,
(HENP, VLBI, Oncology, …)
Application Frameworks
Grid Middleware
End-to-end Monitoring
Intelligent Agents
 Cisco, Level(3)
Flagship Applications
End-to-end Monitoring
Intelligent Agents
UMich, SLAC,FNAL,
MIT/Haystack,
CERN, UERJ(Rio),
NLR, CENIC, UCAID,
Translight, UKLight,
Netherlight, UvA,
UCLondon, KEK,
Taiwan
National
Lambda
Rail
 Integrated
packet
switched
and circuitGrid/Storage
switched hybrid experimental research
network; leveraging transoceanic R&DManagement
network partnerships
POR
 NLR Waves: 10 GbE (LAN-PHY) wave across the US
SAC
NYC
Protocols
&

to
Japan,
Taiwan,
Brazil
CHI transatlantic; extensionsNetwork
OGDOptical paths
DEN
SVL
CLE
PIT WDC
FRE
Management
 End-to-end
KAN
monitoring;
Realtime trackingBandwidth
and optimization;
RAL
NAS
STR
LAXDynamic
PHO
bandwidth
provisioning,
WAL ATL
Distributed CPU & Storage
SDG
OLG
 Agent-based
services spanning all layers of the system, from the
DAL
JAC
optical cross-connects
to the applications. Network Fabric
SEA
Some Extra Slides
Follow
Computing Model Progress
CMS Internal Review of Software and Computing
Grid Enabled Analysis: View
of a Collaborative Desktop
Building the GAE is the “Acid Test” for Grids; and is
crucial for next-generation experiments at the LHC
 Large, Diverse, Distributed Community of users
 Support hundreds to thousands of analysis tasks,
shared among dozens of sites
 Widely varying task requirements and priorities
 Need Priority Schemes, robust authentication and Security
Relevant to the future needs of research and industry
External Services
Storage Resource Broker
CMS ORCA/COBRA
Cluster Schedulers
Iguana
ATLAS DIAL
Griphyn VDT
MonaLisa Monitoring
VO Management
Authentication
Browser
MonaLisa
ROOT
PDA
Clarens
Authorization
Logging
File Access
Key Escrow
Shell
Four LHC Experiments: The
Petabyte to Exabyte Challenge
ATLAS, CMS, ALICE, LHCB
Higgs + New particles; Quark-Gluon Plasma; CP Violation
Data stored
CPU
Tens of PB 2008; To 1 EB by ~2015
Hundreds of TFlops To PetaFlops
Grid Enabled Analysis: View
of a Collaborative Desktop
Building the GAE is the “Acid Test” for Grids; and is
crucial for next-generation experiments at the LHC
 Large, Diverse, Distributed Community of users
 Support hundreds to thousands of analysis tasks,
shared among dozens of sites
 Widely varying task requirements and priorities
 Need Priority Schemes, robust authentication and Security
Relevant to the future needs of research and industry
External Services
Storage Resource Broker
CMS ORCA/COBRA
Cluster Schedulers
Iguana
ATLAS DIAL
Griphyn VDT
MonaLisa Monitoring
VO Management
Authentication
Browser
MonaLisa
ROOT
PDA
Clarens
Authorization
Logging
File Access
Key Escrow
Shell
Increased functionality,
standardization
The Move to OGSA and then
Managed Integration Systems
~Integrated Systems
Web services + …
X.509,
LDAP,
FTP, …
Custom
solutions
App-specific
Services
Open Grid
Services Arch
Stateful;
Managed
GGF: OGSI, …
(+ OASIS, W3C)
Globus Toolkit Multiple implementations,
including Globus Toolkit
Defacto standards
GGF: GridFTP, GSI
Time
HENP Data Grids Versus
Classical Grids
 The original Computational and Data Grid concepts are
largely stateless, open systems: known to be scalable
 Analogous to the Web
 The classical Grid architecture has a number of implicit
assumptions
 The ability to locate and schedule suitable resources,
within a tolerably short time (i.e. resource richness)
 Short transactions with relatively simple failure modes
 But - HENP Grids are Data Intensive & Resource-Constrained
 1000s of users competing for resources at 100s of sites
 Resource usage governed by local and global policies
 Long transactions; some long queues
 Need Realtime Monitoring and Tracking
 Distributed failure modes Strategic task management
Global Client / Dynamic Discovery
Monitoring & Managing VRVS Reflectors
VRVS
(Version
3)
VRVS
on
Windows
Meeting in 8 Time
Zones
78 Reflectors
Deployed Worldwide
Users in 96 Countries
Production BW Growth of Int’l HENP
Network Links (US-CERN Example)
 Rate of Progress >> Moore’s Law. (US-CERN Example)
 9.6 kbps Analog
(1985)
 64-256 kbps Digital
(1989 - 1994)
[X 7 – 27]
 1.5 Mbps Shared
(1990-3; IBM)
[X 160]
 2 -4 Mbps
(1996-1998)
[X 200-400]
 12-20 Mbps
(1999-2000)
[X 1.2k-2k]
 155-310 Mbps
(2001-2)
[X 16k – 32k]
 622 Mbps
(2002-3)
[X 65k]
 2.5 Gbps
(2003-4)
[X 250k]
 10 Gbps 
(2005)
[X 1M]
 A factor of ~1M over a period of 1985-2005
(a factor of ~5k during 1995-2005)
 HENP has become a leading applications driver,
and also a co-developer of global networks
HEP is Learning How to Use Gbps Networks Fully:
Factor of ~50 Gain in Max. Sustained TCP Thruput
in 2 Years, On Some US+Transoceanic Routes
*
9/01
105 Mbps 30 Streams: SLAC-IN2P3; 102 Mbps 1 Stream CIT-CERN
 5/20/02 450-600 Mbps SLAC-Manchester on OC12 with ~100 Streams
 6/1/02
290 Mbps Chicago-CERN One Stream on OC12
 9/02
850, 1350, 1900 Mbps Chicago-CERN 1,2,3 GbE Streams, 2.5G Link
 11/02 [LSR] 930 Mbps in 1 Stream California-CERN, and California-AMS
FAST TCP 9.4 Gbps in 10 Flows California-Chicago
 2/03
[LSR] 2.38 Gbps in 1 Stream California-Geneva (99% Link Utilization)
 5/03
[LSR] 0.94 Gbps IPv6 in 1 Stream Chicago- Geneva
 TW & SC2003: 5.65 Gbps (IPv4), 4.0 Gbps (IPv6) in 1 Stream Over 11,000 km

UltraLight: An Ultra-scale Optical Network
Laboratory for Next Generation Science
http://ultralight.caltech.edu
 Ultrascale protocols and MPLS: Classes of service
used to share primary 10G  efficiently
 Scheduled or sudden “overflow” demands handled
by provisioning additional wavelengths:
 N*GE, and eventually 10 GE
 Use path diversity, e.g. across North America and Atlantic
 Move to multiple 10G ’s (leveraged) by 2005-6
 Unique feature: agent-based, end-to-end monitored,
dynamically provisioned mode of operation
 Agent services span all layers of the system;
Communication application characteristics
and requirements to
 The protocol stacks, MPLS class provisioning
and the optical cross-connects
 Dynamic responses help manage traffic flow
HENP Networks and Grids; UltraLight
 The network backbones and major links used by major HENP
projects are advancing rapidly
 To the 2.5-10 G range in 18 months; much faster than
Moore’s Law
 Continuing a trend: a factor ~1000 improvement per decade
 Transition to a community-owned and operated infrastructure
for research and education is beginning with (NLR, USAWaves)
 HENP is learning to use 10 Gbps networks effectively
over long distances
 Fall 2003 Development: 5 to 6 Gbps flows over 11,000 km
 A new HENP and DOE Roadmap: Gbps to Tbps links in ~10 Years
 UltraLight: A hybrid packet-switched and circuit-switched
network: ultra-protocols (FAST), MPLS + dynamic provisioning
 To serve the major needs of the LHC & Other major programs
 Sharing, augmenting NLR and internat’l optical infrastructures
A cost-effective model for future HENP, DOE networks