IEPM/PingER Internet End-to-end Performance Monitoring and the PingER project Warren Matthews and Les Cottrell (SLAC) National Collaboratory Middleware and Network Research Project Review, ANL, August.

Download Report

Transcript IEPM/PingER Internet End-to-end Performance Monitoring and the PingER project Warren Matthews and Les Cottrell (SLAC) National Collaboratory Middleware and Network Research Project Review, ANL, August.

IEPM/PingER
Internet End-to-end Performance Monitoring and the PingER project
Warren Matthews and Les Cottrell (SLAC)
National Collaboratory Middleware and Network
Research Project Review, ANL,
August 18-20, 2003.
1
Overview
•
•
•
•
•
•
A little History
Evaluate the progress
Assess the Value
Interactions with other projects
Elements that should be added
Summary
11/7/2015 9:55 AM
2
History
• Ping End-to-end Reporting
• Began early 1995
– Monitor network performance to sites collaborating with
SLAC
• ESnet Network Monitoring Task Force (NMTF)
– Extended to several DoE labs, strong support from FNAL
• 1997 ICFA created Network Task Force
– PingER spreads worldwide
– Funded by DOE/MICS
3
Recently
• In 2001, extended PingER to include
bandwidth testing
– IEPM-BW
• End-to-end user perception for high
performance bulk-transfer
• Iperf, bbftp, GridFTP …
• Heavy network impact compared to
lightweight PingER
4
Current Status
• PingER funding is under Thomas Ndousse
• DoE/MICS funding runs out at end of
year
• Continues to be extremely useful
– Most recently began working with ICTP/eJDS
to quantify the Digital Divide
• MAGGIE proposal to develop/extend high
performance monitoring (with PSC, ICIR, LBNL)
5
Overview
• A little History
• Evaluate the progress
•
•
•
•
Assess the Value
Interactions with other projects
Elements that should be added
Summary
11/7/2015 9:55 AM
6
PingER
• Mature, Successful
• Widely used in HENP
• Utilization has been extended beyond HENP
– EDG, IAEA, XIWT
– ICTP/eJDS
– Many others
• Continues to be extended to meet new needs
– Better visualization, web services access to data …
7
PingER Methodology
• Simple ping monitoring
• 1 ping to prime caches
• Send, size
– Default is 10x100 Byte pkts, 10x1000 Byte pkts
• Record ping packet loss and RTT
• Derive unreachability, quiescence,
unpredictability, jitter, TCP throughput
• Also out-of-order packets, duplicate
packets
8
PingER Software
•
•
•
•
•
Monitoring
Analysis
Visualization
Available from SLAC/FNAL websites
Package
11/7/2015 9:55 AM
9
Using PingER
•
•
•
•
•
•
•
Since 1995
Trouble shooting
Identify Sites to Upgrade
Choosing a provider
Setting expectations for VoIP
Routing Choices for multihomed sites
Compare with http, ftp
– Strong correlation
11/7/2015 9:55 AM
10
PingER Deployment
• Currently 36 monitoring sites in 14
countries
• 473 target sites in 79 countries
– 99% of the worlds on-line population
• Most extensive end-to-end active
R&E network monitoring worldwide
• Special BaBar, PPDG, Digital Divide.
etc groups and pages
11
End-to-end Monitoring
• In reality, most projects monitor
end-to-end performance
– End host effects are unavoidable
• Internet2 end-to-end Performance
Initiative (e2epi) has recognized this
– Most useful to users.
11/7/2015 9:55 AM
12
IEPM-BW
• Throughput Monitoring
– Traceroute
– Iperf (+quick iperf), BBftp, BBCP (mem and
disk)
– ABWE (available bandwidth)
– GridFTP, UDPMON
– Web100
– Netflow
• Analysis
13
IEPM-BW Deployment
• Currently 10 monitoring sites
–
–
–
–
SLAC, FNAL, GATech (SOX)
INFN (Milan), NIKHEF, APAN (Japan)
UMich, Internet2 (Ann Arbor)
UManchester, UCL (UK)
• 50 unique target sites
11/7/2015 9:55 AM
14
Using IEPM-BW
• Usual
– Baselines
– Troubleshooting
– Setting expectations
• Also on both testbeds and production nets
– Compare measurement tools (ping vs ABwE vs.
iperf/quick iperf vs bbcp vs GridFTP vs
tsunami)
– Compare advanced TCP stacks
• Eliminate need for multiple streams
– Look at non TCP bulk transfer
15
Overview
• A little History
• Evaluate the progress
• Assess the Value
• Interactions with other projects
• Elements that should be added
• Summary
11/7/2015 9:55 AM
16
Examples
•
•
•
•
•
•
Long term trends
Short term glitches
Troubleshooting
Upgrades
Vacations
Peering
11/7/2015 9:55 AM
17
2Mbps
vacation
Multiple
OC12s
Traffic on ESnet has doubled every year
18
12
Packet Loss between DESY and FNAL in February and
March 2000.
8
6
4
DFN closes Perryman
POP and looses direct
peering with ESnet
2
Peering re-established via
Dante at 60 Hudson
February
Day of the Month
March
19
19
17
15
13
11
9
7
5
3
1
27
25
23
21
19
17
15
13
11
9
7
5
3
0
1
Daily Packet Loss (%)
10
Median Packet Loss Seen From nbi.dk
45
Ten-155 became
operational on
December 11.
40
35
To
North America
30
Smurf Filters
installed on
NORDUnet’s
US connection.
25
20
15
10
5
To Western Europe
1/24/99
1/17/99
1/10/99
1/3/99
12/27/98
12/20/98
12/13/98
12/6/98
11/29/98
11/22/98
11/15/98
11/8/98
0
11/1/98
% 100 Byte Packet Loss During Day.
50
20
21
2/17/2003
2/3/2003
50000
1/20/2003
1/6/2003
12/23/2002
12/9/2002
11/25/2002
11/11/2002
10/28/2002
10/14/2002
9/30/2002
9/16/2002
9/2/2002
8/19/2002
8/5/2002
7/22/2002
7/8/2002
6/24/2002
6/10/2002
5/27/2002
5/13/2002
250000
Throughput from SLAC to RAL
between May 2002 and February 2003
200000
150000
100000
iperf
bbcpmem
bbcpdisk
bbftp
0
22
Traffic
Typically, Internet traffic is 70% http
23
Conclusions
•
Establish layer 3 connectivity exists
•
Iperf vs Quick iperf
•
BBftp vs BBCP => implementation
•
IPERF vs BBftp => CPU, Disk
•
Packet Loss < 0.1%
•
TCP/IP must be tuned on high-speed long delay paths
•
Web100/Net100
24
eJDS
• PingER continues to be useful
• Recently joined with electronic journal
distribution service (eJDS)
• Distribute physics journals to member
around the world
• Particularly concerned with
quantifying the Digital Divide
11/7/2015 9:55 AM
25
Limitations
• ICMP
• Do not monitor routers
• Rate limiting
• Blocking is common, especially in developing
countries
• However, study indicates low impact from
rate limiting
• Scheduling with cron
26
Overview
• A little History
• Evaluate the progress
• Assess the Value
• Interactions with other projects
• Elements that should be added
• Summary
11/7/2015 9:55 AM
27
Comparison to Other
Projects
•
•
•
•
•
•
•
Surveyor
RIPE
AMP
NIMI
SCNM
XIWT
NetPhysics
28
Comparisons
• Typically results were closely
correlated.
• Often tools complement each other
and combined provide insight into
network behaviour.
• Derived throughput from equation of
Mathis et al (BW~MSS/(RTT*sqrt(loss)) )
shows good agreement
11/7/2015 9:55 AM
29
Publishing
• Network Performance information is
critical to the Grid vision
– Application steering
• Working with GGF/NMWG & PPDG
• Monitoring data is available as
prototype Web Service
• OGSI Grid service under development
11/7/2015 9:55 AM
30
Internet2 PIPES
•
•
•
•
E2e pi
PIPES infrastructure
IEPM-BW Job manager
MAGGIE Analysis Engine
11/7/2015 9:55 AM
31
Available Bandwidth Estimator (ABwE)
• Tool under development by SLAC/Rice
– Part of the DoE/SCIDAC INCITE project
• Light weight
– 60 packets in 1 second
– Iperf 35,000 packets/s for 10-20 seconds
• No need to tune windows/streams
• Replace iperf in test engine
• FreeBSD version created for Abilene
Backbone Measurement Infrastructure
32
Quick Iperf
• Iperf is the tool of choice for many
admins.
– Considered accurate but intrusive.
– Errors due to long slow start
• Use web100 to detect end of slow
start. Modify iperf client. Web100
required on client only.
• Measurement within 10%
• Save 94% time, 92% traffic
33
PingER-6
• SLAC has native IPv6 service from
ESnet
• PingER ported to IPv6
• Monitoring started in November 1999
• 41 Sites in 10 countries
• edu/ac.*, net/net.*, com/co.*
11/7/2015 9:55 AM
34
PingER -vs- PingER6
400
RTT between SLAC
and Purdue in Nov
and Dec 1999.
350
300
250
IPv6
200
150
100
IPv4
50
31
28
27
25
23
21
19
17
15
13
11
9
7
5
3
1
29
27
25
23
21
19
17
35
15
13
11
9
7
3
1
5
11/7/2015 9:55 AM
0
Overview
•
•
•
•
A little History
Evaluate the progress
Asses the Value
Interactions with other projects
• Elements that should be added
• Summary
11/7/2015 9:55 AM
36
MAGGIE
• Need to further develop IEPM-BW
– On-demand measurements, visualization,
automated trouble shooting
• Measurement and Analysis for the
Global Grid and Internet End-to-end
performance
– A secure, scalable measurement
infrastructure providing measurement,
analysis and access to data.
37
MAGGIE
ICIR
PSC
Akenti
IEPM-BW
SLAC
Measurement Engine
NIMI
NWS
FNAL
Other tools
Security and scheduling
MAGGIE
SCIDAC
Publishing
ANL
AMP
Fault Finding
Analysis Engine
NMWG
LBNL
SLAC
UCL
SLAC
RIPE 38
Overview
•
•
•
•
•
A little History
Evaluate the progress
Assess the Value
Interactions with other projects
Elements that should be added
• Summary
11/7/2015 9:55 AM
39
Meeting the Objectives (1/4)
Evaluate the Progress
•
The user cannot assume the network will
be there.
The problem:
– Even if it is, the user cannot assume it will perform to
their expectation.
•
•
PingER has set expectation, provides
data for troubleshooting, provides data for
research. Continues to be useful.
A Unique contribution: Probably the largest monitoring
project in the world. IEPM-BW comparing tools,
leveraging other efforts.
The vision (realized):
40
Meeting the Objectives (2/4)
Assess the value
• PingER is widely used and
continues to be useful.
• Goals get more ambitious
• Challenges remain
11/7/2015 9:55 AM
41
Meeting the Objectives (3/4)
Interactions Across Projects
• Long history of involvement in
other projects
– HENP, ESnet, Grid, High
Performance, ICFA-SCIC
– Friends, colleagues and contacts
throughout the world (Other worlds
coming soon)
• Bright future for MAGGIE.
42
Meeting the Objectives (4/4)
Assess the Integrated Impact
• The contribution to the big picture by
IEPM-PingER, IEPM-BW and
especially the need for MAGGIE have
been summarized by Mary-Anne and
Thomas
– But they may not have known it
11/7/2015 9:55 AM
43
The Big Picture
All of the National Collaboratory and Network
Research projects have specific goals and objectives,
but all of you involved in those projects are also part
of a much larger, longer term effort, namely creating
an infrastructure that will enable geographically
separated scientists to effectively work together as
a team and that will facilitate remote access to both
facilities and data.
-Mary-Anne and Thomas
11/7/2015 9:55 AM
44
Toward a Monitoring
Infrastructure
• Certainly the need
– DOE Science Community, SCIDAC Testbed
– Grid, Large Scale Networking
– Troubleshooting / E2Epi
• Many of the ingredients
– Many monitoring projects
– Many tools
– PIPES, MAGGIE (Cross domain)
11/7/2015 9:55 AM
45
Summary
“Unfortunately, network management
research has historically been very
under-funded, because it is difficult to
get funding bodies to recognize this as
legitimate networking research.”
Sally Floyd
IAB Concerns & Recommendations Regarding
Internet Research & Evolution.
http://www.ietf.org/internet-drafts/draft-iab-research-funding-00.txt
46
Links
• Accompanying
paper
• IEPM-BW Home
• 7 papers and
35 talks in the
last 12 months
• ABwE
11/7/2015 9:55 AM
• RIPE-TT
• E2E PI
• GGF NMWG
• AMP
TroubleShooting
• Quick Iperf
47
Credits
• Connie Logg, Jerrod Williams (SLAC), Jiri
Navratil (CESnet/SLAC), David Martin,
Frank Nagy, Al Thomas, Maxim Grigoriev
(FNAL), Fabrizio Coccetti (INFN/SLAC).
• Brian Tierney, Eric Boyd, Jeff Boote, Matt
Zekauskas, Matt Mathis, Russ Hobby, Vern
Paxson, Andy Adams, kc Claffy, Iosif
Legrand, Ajay Tirumala, Tom Dunigan.
• Local admins and other volunteers
• DoE/MICS
48