Revolutions in Networks and Grids for HEP and Global Science

Download Report

Transcript Revolutions in Networks and Grids for HEP and Global Science

Stanford University, SLAC, NIIT, the
Digital Divide & Bandwidth Challenge
Prepared by Les Cottrell, SLAC
for the NIIT , February 22, 2006
Stanford University
• Location
Some facts
• Founded in 1890’s by Governor Leland Stanford &
wife Jane
– in memory of son Leland Stanford Jr.
– Apocryphal story of foundation
• Movies invented at Stanford
• 1600 freshman entrants/year (12% acceptance), 7:1
student:faculty, students from 53 countries
• 169K living Stanford alumni
Some alumni
•
•
•
•
Sports: Tiger Woods, John McEnroe
Sally Ride Astronaut
Vint Cerf “father of Internet”
Industry:
– Hewlett & Packard, Steve Ballmer CEO Microsoft, Scott
McNealy Sun …
• Ex-presidents: Ehud Barak Israel, Alejandro Toledo
Peru
• US Politics: Condoleeza Rice, George Schultz,
President Hoover
Some Startups
• Founded Silicon Valley (turned orchards into
companies):
– Start by providing land and encouragement (investment)
for companies started by Stanford alumni, such as HP &
Varian
– More recently: Sun (Stanford University Network), Cisco,
Yahoo, Google
Excellence
• 17 Nobel prizewinners
• Stanford Hospital
• Stanford Linear Accelerator Center (SLAC) – my home:
– National Lab operated by Stanford University funded by US
Department of Energy
– Roughly 1400 staff, + contractors & outside users => 3000, ~ 2000
on site at a given time
– Fundamental research in:
•
•
•
•
•
Experimental particle physics
Theoretical physics
Accelerator research
Astro-physics
Synchrotron Light research
– Has faculty to pursue above research and awards degrees, 3
Nobel prizewinners
Work with NIIT
• Co-supervision of students, build research capacity, publish etc., for
example:
– Quantify the Digital Divide:
• Develop a measurement infrastructure to provide information on the extent of the
Digital Divide:
– Within Pakistan, between Pak & other regions
• Improve understanding, provide planning information, expectations, identify needs
• Provide and deploy tools in Pakistan
• MAGGIE-NS collaboration - projects:
– TULIP - Faran
– Network Weather Forecasting – Fawad, Fareena
– Anomaly – Fawad, Adnan, Muhammad Ali
• Detection, diagnosis and alerting
– PingER Management - Waqar
– MTBF/MTTR of networks – Not assigned
– Federating Network monitoring Infrastructures – Asma, Abdullah
• Smokeping, PingER, AMP, MonALISA, OWAMP …
– Digital Divide – Aziz, Akbar, Rabail
Quantifying the Digital Divide: A
scientific overview of the connectivity
of South Asian and African Countries
Les CottrellSLAC,
Aziz RehmatullahNIIT, Jerrod WilliamsSLAC, Arshad AliNIIT
Presented at the CHEP06 Meeting, Mumbai, India February 2006
www.slac.stanford.edu/grp/scs/net/talk05/icfa-chep06.ppt
Introduction
• PingER project originally (1995) for measuring
network performance for US, Europe and Japanese
HEP community
• Extended this century to measure Digital Divide for
Academic & Research community
• Last year added monitoring sites in S. Africa,
Pakistan & India
• Will report on network performance to these regions
from US and Europe – trends, comparisons
• Plus early results within and between these regions
Why does it matter?
• Scientists cannot collaborate as equal partners unless they have
connectivity to share data, results, ideas etc.
• Distance education needs good communication for access to
libraries, journals, educational materials, video, access to other
teachers and researchers.
PingER coverage
• ~120 countries (99% world’s connected population), 35
monitor sites in 14 countries
• New monitoring sites in Cape Town, Rawalpindi, Bangalore
• Monitor 25 African countries, contain 83% African
population
•
•
•
•
Minimum RTT from US
Indicates best possible, i.e. no queuing
>600ms probably geo-stationary satellite
Only a few places still using satellite, mainly Africa
Between developed regions min-RTT dominated by distance
– Little improvement possible
Jan 2000
Dec 2003
World thruput seen from US
Behind Europe
6 Yrs: Russia,
Latin America
7 Yrs: Mid-East,
SE Asia
10 Yrs: South Asia
11 Yrs: Cent. Asia
12 Yrs: Africa
South Asia,
Central Asia, and
Africa are in
Danger of Falling
Even Farther
Behind
• Derived throughput~MSS/(RTT*sqrt(loss)), Mathis
Many sites in DD
have less
connectivity than a
residence in US or
Europe
S. Asia & Africa from US
• Data v. noisy but
there are
noticeable trends
• India may be
holding its own
• Africa & Pakistan
are falling behind
Pakistan
Compare to US residence
• Sites in many countries have bandwidth< US residence
India to India
• Monitoring host in Bangalore from Oct ’05
– Too early to tell much, also need more sites, have some good
contacts
• 3 remote hosts (need to increase):
– R&E sites in Mumbai & Hyderabad
– Government site in AP
• Lot of difference between sites, Gov. site sees heavy
Average - Minimum RTT from Banglore
400
congestion
Govt of. AP
350
Hyderabad, AP
Mumbai, Maharashtra
250
200
150
100
50
6-Feb
7-Jan
8-Dec
8-Nov
0
9-Oct
RTT ms
300
PERN: Network Architecture
International
33 2MB
Mbps
Lahore
Core ATM/Router
International
57 4MB
Mbps
12
Replica of Kr./Iba
International
2MB
65 Mbps
Universities
2x2Mbps
50
Mbps
Mbps
Karachi
Core ATM/Router
22
Universities
Customer
DRS
2x2Mbps
50
LAN Switch
OF
Node
University
DXX
Universities
University
LAN Switch
Mbps
OFS
23
50
2x2Mbps
Access
Router
DRS
Islamabad
Core ATM/Router
Access
Router
DXX
University
OFS
University
OFS
DRS
DXX
DRS
DXX
University
HEC will invest $ 4M in Backbone
3 To 9 Points-of-Presence (Core Nodes)
$ 2.4M from HEC to Public Universities for Last Mile Costs
Possible Dark Fiber Initiative
Pakistan to Pakistan
• 3 monitoring sites in Islamabad/Rawalpindi
– NIIT via NTC, NIIT via Micronet, NTC (PERN supplier)
– All monitor 7 Universities in ISB, Lahore, KHI, Peshawar
• Careful: many University sites have proxies in US & Europe
• Minimum RTTs: best NTC 6ms, NIIT/NTC 10ms - extra
4ms for last mile, NIIT/Micronet 60ms – slower links
different routes
• Queuing = Avg(RTT)-Min(RTT) 500
450
Avg - Min RTT from 3 monitoring Sites in Pakistan to
Pakistan
The PingER Project:
http://www-iepm.slac.stanford.edu/pinger/
• 200-400ms queuing
– Better when students holiday
– NIIT/Micronet & NTC OK
– Outages show fragility
RTT ms
– NIIT/NTC heavily congested
Median NIIT N2
Median NTC
Median NIIT N4
400
350
NIIT
300
250
200
150
100
Holiday
50
0
1-Dec 11-Dec 21-Dec 31-Dec 10-Jan 20-Jan 30-Jan
Pakistan Network Fragility
Remote host outages
NIIT/NTC
NTC
NIIT outage
NIIT/NTC heavily congested
Other sites OK
NIIT/Micronet
Pakistan International fragility
RTT ms
Loss %
• Typically once a month losses go to 20%
Feb05 Another fiber outage, this time of 3 hours!
Power cable dug up by excavators of
Karachi Water & Sewage Board
Jul05
Fiber cut off Karachi
causes 12 day outage JunJul ’05, Huge losses of
confidence and business
• Infrastructure appears fragile
• Losses to QEA & NIIT are 3-8% averaged over month
Many
systemic
factors:
Electricity,
Import duties,
Skills
M. Jensen
Users per networked computers by regions
Bandwidth per networked
computer
n=66
n=73
55
Average
36.57
40
North
35 Africa
15
30
West Africa
Mean Kbps
25
Regions
per
20
networkedEast Africa
computer 15
Central
10 Africa
5
Southern Africa
0
Average Cost
$ 11/kbps/Month
63
50
171
3.36
0.32
11
Minimum
0
50
Maximum
100
150Mean
200
Average number of users per networked computer
Routing in Africa
• Seen from ZA
• Only Botswana &
Zimbabwe are
direct
• Most go via
Europe or USA
• Wastes costly
international
bandwidth
Loss within Africa
Satellites vs Terrestrial
• Terrestrial links via SAT3 & SEAMEW (Mediterranean)
• Terrestrial not available to all within countries, EASSy will help
PingER min-RTT measurements from
S. African TENET monitoring station
Between Regions
• Red ellipses show
within region
• Blue = min(RTT)
• Red = min-avg RTT
• India/Pak green
ellipses
• ZA heavy congestion
– Botswana, Argentina,
Madascar, Ghana, BF
• India better off than
Pak
Overall
•
•
•
•
•
Sorted by Median throughput
Within region performance better (blue ellipses)
Europe, N. America, E. Asia Russia generally good
M. East, Oceania, S.E. Asia, L. America acceptable
Africa, C. Asia, S. Asia poor
Examples
• India got Internet connectivity in 1989, China 1994
– India is 34Mbits/s backbones, one possible 622Mbits/s
– China is deploying multi 10Gbits/s
• Brazil and India had similar International
connectivity in 2001, now Brazil is at multi-Gbits/s
• Pakistan PERN backbone is 50Mbits/s, and end
sites are ~1Mbits/s
• Growth in # Internet users (2000-2005): 420%
Brazil, China 393%, 5000% Pakistan, 900% India,
demand outstripping growth
– www.internetworldstats.com/stats.htm
Conclusions
• S. Asia and Africa ~ 10 years behind and falling
further behind creating a Digital Divide within a
Digital Divide
• India appears better than Africa or Pakistan
• Last mile problems, and network fragility
• Decreasing use of satellites, still needed for many
remote countries in Africa and C. Asia
– EASSy project will bring fibre to E. Africa
• Growth in # users 2000-2005 400% Africa, 5000%
Pakistan networks not keeping up
• Need more sites in developing regions and longer
time period of measurements
More information
• Thanks to: Harvey Newman & ICFA for encouragement &
support, Anil Srivastava (World Bank) & N.Subramanian
(Bangalore) for India, NIIT, NTC and PERN for Pakistan
monitoring sites, FNAL for PingER management support,
Duncan Martin & TENET (ZA).
• Future: work with VSNL & ERnet for India, Julio Ibarra &
Eriko Porto for L. America, NIIT & NTC for Pakistan
• Also see:
• ICFA/SCIC Monitoring report:
– www.slac.stanford.edu/xorg/icfa/icfa-net-paper-jan06/
• Paper on Africa & S. Asia
– www.slac.stanford.edu/grp/scs/net/papers/chep06/paper-final.pdf
• PingER project:
– www-iepm.slac.stanford.edu/pinger/
SC|05 Bandwidth Challenge
ESCC Meeting
9th February ‘06
Yee-Ting Li
Stanford Linear Accelerator Center
LHC Network
Requirements
CERN/Outside Resource Ratio ~1:2
Tier0/( Tier1)/( Tier2)
~1:1:1
~PByte/sec
~150-1500
MBytes/sec
Online System
Experiment
CERN Center
PBs of Disk;
Tape Robot
Tier 0 +1
Tier 1
10 - 40 Gbps
IN2P3 Center
INFN Center
RAL Center
FNAL Center
~10 Gbps
Tier 2
Tier 3
Tier2 Center
Tier2 Center
Tier2 Center
Tier2 Center Tier2 Center
~1-10 Gbps
Institute Institute
Physics data cache
Workstations
Institute
Institute
1 to 10 Gbps
Tens of Petabytes by 2007-8.
An Exabyte ~5-7 Years later.
Overview
• Bandwidth Challenge
– ‘The Bandwidth Challenge highlights the best and brightest in new
techniques for creating and utilizing vast rivers of data that can be
carried across advanced networks.‘
– Transfer as much data as possible using real applications over a 2
hour window
• We did…
– Distributed TeraByte Particle Physics Data Sample Analysis
– ‘Demonstrated high speed transfers of particle physics data
between host labs and collaborating institutes in the USA and
worldwide. Using state of the art WAN infrastructure and Grid Web
Services based on the LHC Tiered Architecture, they showed realtime particle event analysis requiring transfers of Terabyte-scale
datasets.’
Overview
• In detail, during the bandwidth challenge (2 hours):
– 131 Gbps measured by SCInet BWC team on 17 of our waves (15
minute average)
– 95.37TB of data transferred.
• (3.8 DVD’s per second)
– 90-150Gbps (peak 150.7Gbps)
• On day of challenge
– Transferred ~475TB ‘practising’ (waves were shared, still tuning
applications and hardware)
– Peak one way USN utlisation observed on a single link was
9.1Gbps (Caltech) and 8.4Gbps (SLAC)
• Also wrote to StorCloud
– SLAC: wrote 3.2TB in 1649 files during BWC
– Caltech: 6GB/sec with 20 nodes
Networking Overview
• We had 22 10Gbits/s waves to the Caltech and
SLAC/FNAL booths. Of these:
– 15 waves to the Caltech booth (from Florida (1), Korea/GLORIAD
(1), Brazil (1 * 2.5Gbits/s), Caltech (2), LA (2), UCSD, CERN (2), U
Michigan (3), FNAL(2)).
– 7 x 10Gbits/s waves to the SLAC/FNAL booth (2 from SLAC, 1
from the UK, and 4 from FNAL).
• The waves were provided by Abilene, Canarie, Cisco (5),
ESnet (3), GLORIAD (1), HOPI (1), Michigan Light Rail
(MiLR), National Lambda Rail (NLR), TeraGrid (3) and
UltraScienceNet (4).
Network Overview
Hardware (SLAC only)
• At SLAC:
– 14 x 1.8Ghz Sun v20z (Dual Opteron)
– 2 x Sun 3500 Disk trays (2TB of storage)
– 12 x Chelsio T110 10Gb NICs (LR)
– 2 x Neterion/S2io Xframe I (SR)
– Dedicated Cisco 6509 with 4 x 4x10GB blades
• At SC|05:
– 14 x 2.6Ghz Sun v20z (Dual Opteron)
– 10 QLogic HBA’s for StorCloud Access
– 50TB Storage at SC|05 provide by 3PAR (Shared with Caltech)
– 12 x Neterion/S2io Xframe I NICs (SR)
– 2 x Chelsio T110 NICs (LR)
– Shared Cisco 6509 with 6 x 4x10GB blades
Hardware at SC|05
Software
• BBCP ‘Babar File Copy’
– Uses ‘ssh’ for authentication
– Multiple stream capable
– Features ‘rate synchronisation’ to reduce byte retransmissions
– Sustained over 9Gbps on a single session
• XrootD
– Library for transparent file access (standard unix file functions)
– Designed primarily for LAN access (transaction based protocol)
– Managed over 35Gbit/sec (in two directions) on 2 x 10Gbps
waves
– Transferred 18TBytes in 257,913 files
• DCache
– 20Gbps production and test cluster traffic
BWC Aggregate
Bandwidth
Last year (SC|04)
Cumulative Data
Transferred
Bandwidth
Challenge period
Component Traffic
SLAC-FermiLab-UK
Bandwidth Contributions
FNAL-UltraLight
UKLight
In to
booth
FermiLab-HOPI
SLAC-ESnet
Out from
booth
SLAC-ESnet-USN
SLAC Cluster
Contributions
ESnet routed
In to booth
ESnet SDN layer
2 via USN
Bandwidth
Challenge period
Out from booth
SLAC/FNAL Booth
Aggregate
Mbps
Waves
Problems…
• Managerial/PR
– Initial request for loan hardware took place 6 months in advance!
– Lots and lots of paperwork to keep account of all loan equipment
• Logistical
– Set up and tore down a pseudo production network and servers in
a space of week!
– Testing could not begin until waves were alight
• Most waves lit day before challenge!
– Shipping so much hardware not cheap!
– Setting up monitoring
Problems…
• Tried to configure hardware and software prior to show
• Hardware
– NICS
• We had 3 bad Chelsios (bad memory)
• Xframe II’s did not work in UKLight’s Boston machines
– Hard-disks
• 3 dead 10K disks (had to ship in spare)
– 1 x 4Port 10Gb blade DOA
– MTU mismatch between domains
– Router blade died during stress testing day before BWC!
– Cables! Cables! Cables!
• Software
– Used golden disks for duplication (still takes 30 minutes per disk to replicate!)
– Linux kernels:
• Initially used 2.6.14, found sever performance problems compared to 2.6.12.
– (New) Router firmware caused crashes under heavy load
• Unfortunately, only discovered just before BWC
• Had to manually restart the affected ports during BWC
Problems
• Most transfers were from memory to memory (Ramdisk
etc).
– Local caching of (small) files in memory
– Reading and writing to disk will be the next bottleneck to
overcome
Conclusion
• Previewed the IT Challenges of the next generation Data
Intensive Science Applications (High Energy Physics,
astronomy etc)
– Petabyte-scale datasets
– Tens of national and transoceanic links at 10 Gbps (and up)
– 100+ Gbps aggregate data transport sustained for hours; We
reached a Petabyte/day transport rate for real physics data
• Learned to gauge difficulty of the global networks and
transport systems required for the LHC mission
– Set up, shook down and successfully ran the systems in < 1 week
– Understood and optimized the configurations of various
components (Network interfaces, router/switches, OS, TCP
kernels, applications) for high performance over the wide area
network.
Conclusion
• Products from this the exercise
– An optimized Linux (2.6.12 + NFSv4 + FAST and other TCP stacks) kernel for
data transport; after 7 full kernel-build cycles in 4 days
– A newly optimized application-level copy program, bbcp, that matches the
performance of iperf under some conditions.
– Extensions of Xrootd, an optimized low-latency file access application for
clusters, across the wide area
– Understanding of the limits of 10 Gbps-capable systems under stress.
– How to effectively utilize 10GE and 1GE connected systems to drive 10 gigabit
wavelengths in both directions.
– Use of production and test clusters at FNAL reaching more than 20 Gbps of
network throughput.
• Significant efforts remain from the perspective of highenergy physics
– Management, integration and optimization of network resources
– End-to-end capabilities able to utilize these network resources.
This includes applications and IO devices (disk and storage
systems)
Press and PR
• 11/8/05 - Brit Boffins aim to Beat LAN speed record from vnunet.com
• SC|05 Bandwidth Challenge SLAC Interaction Point.
• Top Researchers, Projects in High Performance Computing Honored
at SC/05 ... Business Wire (press release) - San Francisco, CA, USA
• 11/18/05 - Official Winner Announcement
• 11/18/05 - SC|05 Bandwidth Challenge Slide Presentation
• 11/23/05 - Bandwidth Challenge Results from Slashdot
• 12/6/05 - Caltech press release
• 12/6/05 - Neterion Enables High Energy Physics Team to Beat World
Record Speed at SC05 Conference CCN Matthews News Distribution
Experts
• High energy physics team captures network prize at SC|05 from SLAC
• High energy physics team captures network prize at SC|05
EurekaAlert!
• 12/7/05 - High Energy Physics Team Smashes Network Record, from
Science Grid this Week.
• Congratulations to our Research Partners for a New Bandwidth