Transcript International Networks and the US
Global Lambdas and Grids for Particle Physics in the LHC Era
Harvey B. Newman
California Institute of Technology SC2005 Seattle, November 14-18 2005
Beyond the SM: Great Questions of
Particle Physics and Cosmology
You Are Here .
1. Where does the pattern of particle families and masses come from ?
2. Where are the Higgs particles; what is the mysterious Higgs field ?
3. Why do neutrinos and quarks oscillate ?
4. Is Nature Supersymmetric ?
5. Why is any matter left in the universe ?
6. Why is gravity so weak?
7. Are there extra space-time dimensions?
Other elements 0.03% Neutrinos 0.3% Stars 0.5% Free H and He 4% Dark matter 23% Dark energy 72%
We do not know what makes up 95% of the universe.
Large Hadron Collider CERN, Geneva: 2007 Start
pp
s =14 TeV L=10 34 cm -2 s -1 27 km Tunnel in Switzerland & France TOTEM CMS pp, general purpose; HI
Atlas
5000+ Physicists 250+ Institutes 60+ Countries ALICE : HI LHCb: B-physics Higgs, SUSY, Extra Dimensions, CP Violation, QG Plasma,
the Unexpected
…
LHC Data Grid Hierarchy
Experiment
Tier 1
CERN/Outside Resource Ratio ~1:2 Tier0/(
Tier1)/(
Tier2) ~1:1:1 ~PByte/sec 10 - 40 Gbps Online System
Tier 0 +1
~150-1500 MBytes/sec CERN Center PBs of Disk; Tape Robot FNAL Center IN2P3 Center RAL Center INFN Center ~10 Gbps
Tier 2
Tier2 Center
Tier 3
~1-10 Gbps Institute Institute Institute Institute Tens of Petabytes by 2007-8.
An Exabyte ~5-7 Years later.
Physics data cache 1 to 10 Gbps Workstations
Tier 4
Emerging Vision: A Richly Structured, Global Dynamic System
Long Term Trends in Network Traffic Volumes: 300-1000X/10Yrs
May, 2005 ESnet Accepted Traffic 1990 – 2005 Exponential Growth: +82%/Year for the Last 15 Years; 400X Per Decade R. Cottrell W. Johnston 10 Gbit/s
0
Progress in Steps
SLAC Traffic Growth in Steps: ~10X/4 Years.
Projected: ~2 Terabits/s by ~2014 “Summer” ‘05: 2x10 Gbps links: one for production, one for R&D
Internet 2 Land Speed Record (LSR)
IPv4 Multi-stream record with FAST TCP: 6.86 Gbps X 27kkm: Nov 2004
IPv6 record: 5.11 Gbps between Geneva and Starlight: Jan. 2005
Disk-to-disk Marks: 536 Mbytes/sec (Windows); 500 Mbytes/sec (Linux)
End System Issues: PCI-X Bus, Linux Kernel, NIC Drivers, CPU
Internet2 LSRs: Blue = HEP 7.2G X 20.7 kkm
Internet2 LSR - Single IPv4 TCP stream
0.4 Gbps 12272km 0.9 Gbps 10978km 2.5 Gbps 10037km 5.4 Gbps 7067km 5.6 Gbps 10949km 4.2 Gbps 16343km A 0 r p 2 N v o -0 2 e F b -0 3 c O 0 t 3 o N v -0 3 A 0 r p 4 6.6 Gbps 16500km 7.21 Gbps 20675 km 160 140 120 100 80 60 40 20 0 u J n -0 4 No v -04
Nov. 2004 Record Network
NB: Manufacturers’ Roadmaps for 2006: One Server Pair to One 10G Link
HENP Bandwidth Roadmap for Major Links (in Gbps)
Year 2001 Production
0.155
Experimental
0.622-2.5
Remarks
SONET/SDH
2002 2003 2005
0.622 2.5 10 2.5 10 2-4 X 10 SONET/SDH DWDM; GigE Integ. DWDM; 1 + 10 GigE Integration
Switch;
Provisioning 1 st Gen.
Grids
2007
2-4 X 10 ~10 X 10; 40 Gbps
2009
~10 X 10 or 1-2 X 40 ~5 X 40 or ~20-50 X 10 40 Gbps
Switching
2011
~5 X 40 or ~20 X 10 ~25 X 40 or ~100 X 10 2 nd Gen
Grids Terabit Networks
2013
~Terabit ~MultiTbps ~Fill One Fiber Continuing Trend: ~1000 Times Bandwidth Growth Per Decade; HEP: Co-Developer as well as Application Driver of Global Nets
AsiaPac SEA Aus.
SNV
LHCNet , ESnet Plan 2006-2009: 20-80Gbps US-CERN, ESnet MANs, IRNC
ESnet 2nd Core: 30-50G Japan CHI Europe Japan Europe BNL LHCNet US-CERN: Wavelength Triangle 10/05: 10G CHI + 10G NY 2007: 20G + 20G 2009: ~40G + 40G NYC IRNC Links Metro Rings DEN FNAL DC GEANT2 SURFNet IN2P3 Aus.
SDG ALB ESnet IP Core ≥10 Gbps ELP
ATL
ESnet hubs New ESnet hubs Metropolitan Area Rings Major DOE Office of Science Sites High-speed cross connects with Internet2/Abilene Production IP ESnet core, 10 Gbps enterprise IP traffic Science Data Network core, 40-60 Gbps circuit transport Lab supplied Major international LHCNet Data Network NSF/IRNC circuit; GVA-AMS connection via Surfnet or Geant2 10Gb/s 10Gb/s 30Gb/s 2 x 10Gb/s CERN LHCNet Data Network (2 to 8 x 10 Gbps US-CERN ) ESNet MANs to FNAL & BNL; Dark fiber (60Gbps) to FNAL
Global Lambdas for Particle Physics Caltech/CACR and FNAL/SLAC Booths
Preview global-scale data analysis of the LHC Era (2007-2020+), using next-generation networks and intelligent grid systems
Using state of the art WAN infrastructure and Grid-based Web service frameworks, based on the LHC Tiered Data Grid Architecture
Using a realistic mixture of streams: organized transfer of multi-TB event datasets, plus numerous smaller flows of physics data that absorb the remaining capacity.
The analysis software suites are based on the Grid-enabled Analysis Environment (GAE) developed at Caltech and U. Florida, as well as Xrootd from SLAC, and dcache from FNAL
Monitored by Caltech’s MonALISA global monitoring and control system
Global Lambdas for Particle Physics Caltech/CACR and FNAL/SLAC Booths
We used Twenty Two [*] 10 Gbps waves to carry bidirectional traffic between Fermilab, Caltech, SLAC, BNL, CERN and other partner Grid Service sites including: Michigan, Florida, Manchester, Rio de Janeiro (UERJ) and Sao Paulo (UNESP) in Brazil, Korea (KNU), and Japan (KEK)
Results
151 Gbps peak, 100+ Gbps of throughput sustained for hours: 475 Terabytes of physics data transported in < 24 hours
131 Gbps measured by SCInet bwc team on 17 of our waves
Using real physics applications and production as well as test systems for data access, transport and analysis: bbcp, xrootd, dcache, and gridftp; and grid analysis tool suites
Linux kernel for TCP based protocols, including Caltech’s FAST
Far surpassing our previous SC2004 BWC Record of 101 Gbps [*] 15 at the Caltech/CACR and 7 at the FNAL/SLAC Booth
Monitoring NLR, Abilene/HOPI, LHCNet, USNet, TeraGrid, PWave, SCInet, Gloriad, JGN2, WHREN, other Int’l R&E Nets, and 14000+ Grid Nodes Simultaneously I. Legrand
Switch and Server Interconnections at the Caltech Booth (#428)
15 10G Waves
72 nodes with 280+ Cores
64 10G Switch Ports: 2 Fully Populated Cisco 6509Es
45 Neterion 10 GbE NICs
200 SATA Disks
40 Gbps (20 HBAs) to StorCloud Thursday – Sunday Setup
http://monalisa-ul.caltech.edu:8080/stats?page=nodeinfo_sys
Fermilab
Our BWC data sources are the
Production Storage Systems and File Servers
CDF used by:
DØ
US CMS Tier 1
Sloan Digital Sky Survey
Each of these produces, stores and moves Multi TB to PB-scale data: Tens of TB per day
~600 gridftp servers (of 1000s) directly involved
bbcp ramdisk to ramdisk transfer (CERN to Chicago) (3 TBytes of Physics Data transferred in 2 Hours)
16MB window, 2 streams 440000 430000 420000 410000 400000 390000 380000 370000 1 39 77 115 153 191 229 267 305 343 381 419 457 495 533 571 609 647 685 723 761 799 837 875 913 951 989 1027
Units of 5 seconds
100 80 60 40 20 0
Xrootd Server Performance
A. Hanushevsky Single Server Linear Scaling Netw ork I/O in MB/Sec percent CPU remaining events/sec processed 40000 50 100 150 200 250 300 350 400 Number of Concurrent Jobs 30000 20000 10000 0
Excellent Across WANs
Scientific Results
Ad hoc Analysis of Multi TByte Archives
Immediate exploration Spurs novel discovery approaches
Linear Scaling
Hardware Performance
Deterministic Sizing
High Capacity
Thousands of clients
Hundreds of Parallel Streams
Very Low Latency
12us + Transfer Cost
Device + NIC Limited
Xrootd Clustering Client open file X go to C
Xrootd Clustering
A B open file X C go to F Redirector (Head Node) Supervisor
(
sub-redirector
)
Data Servers D E F Cluster
Client sees all servers as xrootd data servers
Unbounded Clustering
Self organizing
Total Fault Tolerance
Automatic real time reorganization
Result
Minimum Admin Overhead
Better Client CPU Utilization
More results in less time at less
Remote Sites: Caltech, UFL, Brazil…..
ROOT Analysis ROOT Analysis GAE Services GAE Services GAE Services
Authenticated users automatically discover, and initiate multiple transfers of physics datasets ( Root files) through secure Clarens based GAE services .
Transfer is monitored through MonALISA
Once data arrives at the target sites (remote) analysis can start by authenticated users, using the Root analysis framework.
Using the COJAC user.
Clarens Root viewer or event viewer data from remote can be presented transparently to the ROOT Analysis
SC|05 Abilene and HOPI Waves
GLORIAD: 10 Gbps Optical Ring Around the Globe by March 2007
China, Russia, Korea, Japan, US, Netherlands Partnership GLORIAD Circuits Today
10 Gbps Hong Kong-Daejon Seattle
10 Gbps Seattle-Chicago-NYC (CANARIE contribution to GLORIAD)
622 Mbps Moscow-AMS-NYC
2.5 Gbps Moscow-AMS
155 Mbps Beijing-Khabarovsk Moscow
2.5 Gbps Beijing-Hong Kong
1 GbE NYC-Chicago (CANARIE) US: NSF IRNC Program
ESLEA/UKLight SC|05 Network Diagram
6 X 1 GE OC-192
KNU (Korea) Main Goals
Uses 10Gbps GLORIAD link from Korea to US, which is called BIG GLORIAD, also part of UltraLight Try to saturate this BIG-GLORIAD link with servers and cluster storages connected with 10Gbps Korea is planning to be a Tier-1 site for LHC experiments Korea BIG-GLORIAD U.S.
KEK (Japan) at SC05 10GE Switches on the KEK-JGN2-StarLight Path
JGN2
: 10G Network Research Testbed
• Operational since 4/04 • 10Gbps L2 between
Tsukuba and Tokyo Otemachi
• 10Gbps IP to Starlight since
August 2004
• 10Gbps L2 to Starlight since
September 2005 Otemachi–Chicago OC192 link replaced by 10GE WANPHY in September 2005
Brazil HEPGrid: Rio de Janeiro (UERJ) and Sao Paulo (UNESP)
“Global Lambdas for Particle Physics”
A Worldwide Network & Grid Experiment
We have Previewed the IT Challenges of Next Generation Science at the High Energy Frontier (for the LHC and other major programs)
Petabyte-scale datasets
Tens of national and transoceanic links at 10 Gbps (and up) 100+ Gbps aggregate data transport sustained for hours; We reached a Petabyte/day transport rate for real physics data We set the scale and learned to gauge the difficulty of the global networks and transport systems required for the LHC mission
But we set up, shook down and successfully ran the system in <1 week We have substantive take-aways from this marathon exercise
An optimized Linux (2.6.12 + FAST + NFSv4) kernel for data transport; after 7 full kernel-build cycles in 4 days
A newly optimized application-level copy program, bbcp, that matches the performance of iperf under some conditions
Extension of Xrootd, an optimized low-latency file access application for clusters, across the wide area
Understanding of the limits of 10 Gbps-capable systems under stress
“Global Lambdas for Particle Physics”
A Worldwide Network & Grid Experiment
We are grateful to our many network partners: SCInet, LHCNet, Starlight, NLR, Internet2’s Abilene and HOPI, ESnet, UltraScience Net, MiLR, FLR, CENIC, Pacific Wave, UKLight, TeraGrid, Gloriad, AMPATH, RNP, ANSP, CANARIE and JGN2.
And to our partner projects: US CMS, US ATLAS, D0, CDF, BaBar, US LHCNet, UltraLight, LambdaStation, Terapaths, PPDG, GriPhyN/iVDGL, LHCNet, StorCloud, SLAC IEPM, ICFA/SCIC and Open Science Grid
Our Supporting Agencies: DOE and NSF
And for the generosity of our vendor supporters, especially Cisco Systems, Neterion, HP, IBM, and many others, who have made this possible
And the Hudson Bay Fan Company…
Extra Slides Follow
Global Lambdas for Particle Physics Analysis SC|05 Bandwidth Challenge Entry
Caltech, CERN, Fermilab, Florida, Manchester, Michigan, SLAC, Vanderbilt, Brazil, Korea, Japan, et al CERN's Large Hadron Collider experiments: Data/Compute/Network Intensive Discovering the Higgs, SuperSymmetry, or Extra Space-Dimensions - with a Global Grid Worldwide Collaborations of Physicists Working Together; while Developing Next-generation Global Network and Grid Systems
Analysis Sandbox 3 rd party application Service Clarens (ACL, X509, Discovery) Catalog Storage Web server XML-RPC SOAP Java RMI JSON RPC http/ https Clarens Client Network datasets
Authentication
Access control on Web Services.
Remote file access
(and access control on files).
Discovery of Web Services and Software.
Shell service. Shell like access to remote machines (managed by access control lists).
Proxy certificate functionality
Virtual Organization
management and role management.
User's point of access to a Grid system.
Provides environment where user can: Access Grid resources and services.
Execute and monitor Grid applications.
Collaborate with other users.
analysis
One stop shop for Grid needs
Portals can lower the barrier for users to access Web Services and using Grid enabled applications