Network Monitoring for SCIC Les Cottrell, SLAC For ICFA meeting September, 2005 Initially funded by DoE Field Work proposal.

Download Report

Transcript Network Monitoring for SCIC Les Cottrell, SLAC For ICFA meeting September, 2005 Initially funded by DoE Field Work proposal.

Network Monitoring for SCIC
Les Cottrell, SLAC
For ICFA meeting
September, 2005
Initially funded by DoE Field Work proposal. Currently
partially funded by US Department of State/Pakistan Ministry
of Science & Technology
1
Coverage
• Measure the network performance for developing regions
– From developed to developing & vice versa
– Between developing regions & within developing regions
• Originated in High Energy Physics, now focused on DD
– Adding monitoring sites in: Africa, S. America, Russia, Pakistan, India
– Working with Turkey but ISP blocks pings
• http://www-iepm.slac.stanford.edu/pinger/pingerworld/
• Interactive: zoom/pan, mouseover, clickable
Monitoring site
Remote site
PingER coverage
Aug 2005
2
PingER Management
• No funding for PingER ongoing operational
management (40% FTE at the moment), so
simplify management to make easier to sustain
– Develop tools to simplify, automate, reduce manual
effort
– New installation procedures of monitor site
– Assistance to producing executive plots
– Provide alerts for unreachable remote sites
– Provide alerts if unable to gather data from monitor
sites
– Check sanity of data and the configuration
database
3
– Check host are where we think they are…
Triangulation 1/2
• Web hosts with TLDs in many developing
countries have proxies in developed countries
– E.g. 50% of initially chosen Pakistan Universities
had web proxies outside Pakistan
– Use IP2Location.com & traceroute to verify location,
– working on triangulation
• Make RTTmin measures to given host from known
landmarks
• Estimate distance from landmark using d= aL* RTTmin + bL
– Initial aL ~ 50km/ms (speed of light in fiber, factor of 2 for right of
way paths, non great-circle-route hop locations), bL = 0.
• Optimize aL, bL using RTTmin for known PingER pairs
• Locate host lat/long with confidence estimates
4
Triangulation 2/2
• Landmarks:
– Using Looking Glass servers (provide pings)
– Install web accessible on demand ping tool at
PingER monitoring sites
– Use GeoLIM landmarks (for US & W. Europe)
• Installing GeoLIM landmark at NIIT
• Will build tool to validate where PingER nodes
are really located and fix database or replace
5
Worldwide view
• Developed regions improving by factor 10 in < 6 years
• Developing regions such as India and Africa are 5-10
years behind
• May not be catching up.
TCP throughput measured from N.
America to World Regions From the PingER project, Jan 2005
10000
10000.00
Edu (141)
1000
1000.00
Europe(150)
Canada (27)
100
100.00
S.E.
Europe (21)
10
10.00
Caucasus (8)
India(7) Africa (30)Russia(17)
Dec-04
Dec-03
Dec-02
Dec-01
Dec-00
Dec-99
1.00
Dec-98
Dec-97
Dec-96
Jan-96
China (13)
1
Jan-95
Derived TCP throughput in KBytes/sec
Mid East (16) C. Asia (8)
Latin America (37)
50% Improvement/year
~ factor of 10 in < 6 years
6
•
SCIC Monitoring WG
PingER
Measurements from
– 37 monitors in 15
countries
– 726 remote sites in 120
Countries; 3700 monitorremote site pairs
– Measurements go back
to ‘95
– Reports on link reliability,
quality
– Aggregation in
“affinity groups”
Countries monitored
Contain 78% of
world population
90% of Internet
users
Monitoring Sites
New
Affinity Groups (Countries)
Anglo America (2), Latin America (14), Europe (24), S.E. Europe
(9), Africa (26), Mid East (7), Caucasus (3), Central Asia (8),
7
Russia includes Belarus & Ukraine (3), S. Asia (7), China (1) and
Australasia (2).
Case study on Pakistan
• Two sites to join LCG (NUST, QEA/NCP), is
connectivity adequate?
• Prompted by two outages of SEAMEW3
– Fiber cut off Karachi causes 12 day outage Jun-Jul
‘05
• Huge losses of confidence and business
10
Fiber Outage Jun 27-Jul 8 ‘05
• Looked at 9 sites in Pakistan measured from
within and outside Pakistan
Loss %
– Saw big (300=>600ms) increase in min-RTT as
some sites switched to satellite
– Losses 2-3% => >10%
14 Pakistan loss from SLAC
75%
– Unreachability 1-2%=>20%
Median
– Effect varied by site
25%
0
Jan04
Jun05
11
Longer term
RTT ms
Loss %
• Typically once a month losses go to 20%
Feb05
Another fiber outage, this time of 3 hours!
Power cable dug up by excavators of
Karachi Water & Sewage Board
Jul05
Jun/Jul
outage
• Infrastructure appears fragile
• Losses to QEA & NIIT are 3-8% averaged over month
12
Pakistan: Next steps
• Established contacts with PERN (manages
E&R net connections) and NTC (carrier,
government monopoly) and PIE (Pakistan
Internet Exchange - international carrier
interface)
– Monitoring PIE backbone router in Karachi
• NTC router deprecate pings so can’t monitor it
– Establishing PingER monitors in PERN and NTC
• Already have one at NIIT.
• Want to pin-point causes of poor performance (losses,
unreachability)
– Monitoring to NIIT via NTC and Broadband/DSL
provider to compare providers.
13
First results from S. Africa
• Host at Tertiary Education Network (TENET)
site at Ronderbush
– TENET secures for ZA universities & technical
colleges management of service contracts,
operational functions, other value added services
• Monitoring about 45 beacon sites worldwide
• Land line links to world, min-RTTs:
– Europe: ~215ms; US: ~250ms; Russia: ~235ms;
– L. America: ~415ms; E. Asia: ~450ms; Pakistan: ~
465ms; Australia: ~ 480ms
• Evaluating what sites in Africa to monitor
14
Africa Coverage
From S. Africa
• Note we now cover most
(31) countries with many
tertiary education centers
(83% pop)
• Recently added monitoring station in South Africa
(TENET)
15
Satellites vs Terrestrial
• Terrestrial links via SAT3 &
SEAMEW (Mediterranean &
Red Sea)
• Terrestrial not available to all
within countries
16
S Africa Connectivity
• Connections are usually
indirect:
– Costly and wastes
international bandwidth
• Color of country indicates
route from S. Africa
– E.g yellow countries
accessed via Europe
– Purple = some sites via
Europe, some via US
– Red routes go via Europe
and USA
17
Collaborations/funding
• Good news:
– Active collaboration with NIIT Pakistan to develop network
monitoring including PingER (in particular management)
• Travel funded by US State department & Pakistan MOST for 1 year
• Have submitted a follow on proposal to USAID
– FNAL & SLAC continue support for PingER management
and coordination
• Bad news (currently unfunded, could disappear):
– DoE funding for PingER terminated
– Harder to cover from SLAC HEP budget, given new project
oriented budgeting
– For development look at making part of a tool-kit (e.g. VDT)
– Hard to get funding for operational needs (~0.4 FTE)
• For quality data need constant vigilance (host disappear/move,
security blocks pings, need to update remote host lists …), harder
18 as
more/remoter hosts
Overall Situation
• Performance from U.S. & Europe is improving
all over, for losses, RTT & throughput
• Performance to developed countries are orders
of magnitude better than to developing
countries
• Poorer regions 5-10 years behind
• Poorest regions Africa, Central & S. Asia
• Some regions are:
– catching up (SE Europe, Russia),
– keeping up (Latin America, Mid East, China),
– falling further behind (e.g. India, Africa)
19
Future Focii
•
•
•
•
•
•
•
First view of Africa from within Africa
Impact of Gloriad for Russian connectivity
Impact of new RNP initiatives for Brazil
More on India (preparation for CHEP06)
Finish off the study of Pakistan
Impact of new connectivity in E. Asia
Others (suggestions welcome…)
20
Further Information
• PingER project home site
– www-iepm.slac.stanford.edu/pinger/
• PingER methodology (presented at I2 Apr 22 ’04)
– www.slac.stanford.edu/grp/scs/net/talk03/i2-method-apr04.ppt
• ICFA/SCIC Network Monitoring report
– www.slac.stanford.edu/xorg/icfa/icfa-net-paper-jan05/20050206netmon.doc
• ICFA/SCIC home site
– http://icfa-scic.web.cern.ch/ICFA-SCIC/
• SLAC/NIIT collaboration
– http://maggie.niit.edu.pk/
• Pakistan outage:
www.slac.stanford.edu/grp/scs/net/case/pakjul05/jun-july.htm
21