IEPM/PingER Internet End-to-end Performance Monitoring and the PingER project Warren Matthews and Les Cottrell (SLAC) National Collaboratory Middleware and Network Research Project Review, ANL, August.
Download ReportTranscript IEPM/PingER Internet End-to-end Performance Monitoring and the PingER project Warren Matthews and Les Cottrell (SLAC) National Collaboratory Middleware and Network Research Project Review, ANL, August.
IEPM/PingER Internet End-to-end Performance Monitoring and the PingER project Warren Matthews and Les Cottrell (SLAC) National Collaboratory Middleware and Network Research Project Review, ANL, August 18-20, 2003. 1 Overview • • • • • • A little History Evaluate the progress Assess the Value Interactions with other projects Elements that should be added Summary 11/7/2015 9:55 AM 2 History • Ping End-to-end Reporting • Began early 1995 – Monitor network performance to sites collaborating with SLAC • ESnet Network Monitoring Task Force (NMTF) – Extended to several DoE labs, strong support from FNAL • 1997 ICFA created Network Task Force – PingER spreads worldwide – Funded by DOE/MICS 3 Recently • In 2001, extended PingER to include bandwidth testing – IEPM-BW • End-to-end user perception for high performance bulk-transfer • Iperf, bbftp, GridFTP … • Heavy network impact compared to lightweight PingER 4 Current Status • PingER funding is under Thomas Ndousse • DoE/MICS funding runs out at end of year • Continues to be extremely useful – Most recently began working with ICTP/eJDS to quantify the Digital Divide • MAGGIE proposal to develop/extend high performance monitoring (with PSC, ICIR, LBNL) 5 Overview • A little History • Evaluate the progress • • • • Assess the Value Interactions with other projects Elements that should be added Summary 11/7/2015 9:55 AM 6 PingER • Mature, Successful • Widely used in HENP • Utilization has been extended beyond HENP – EDG, IAEA, XIWT – ICTP/eJDS – Many others • Continues to be extended to meet new needs – Better visualization, web services access to data … 7 PingER Methodology • Simple ping monitoring • 1 ping to prime caches • Send, size – Default is 10x100 Byte pkts, 10x1000 Byte pkts • Record ping packet loss and RTT • Derive unreachability, quiescence, unpredictability, jitter, TCP throughput • Also out-of-order packets, duplicate packets 8 PingER Software • • • • • Monitoring Analysis Visualization Available from SLAC/FNAL websites Package 11/7/2015 9:55 AM 9 Using PingER • • • • • • • Since 1995 Trouble shooting Identify Sites to Upgrade Choosing a provider Setting expectations for VoIP Routing Choices for multihomed sites Compare with http, ftp – Strong correlation 11/7/2015 9:55 AM 10 PingER Deployment • Currently 36 monitoring sites in 14 countries • 473 target sites in 79 countries – 99% of the worlds on-line population • Most extensive end-to-end active R&E network monitoring worldwide • Special BaBar, PPDG, Digital Divide. etc groups and pages 11 End-to-end Monitoring • In reality, most projects monitor end-to-end performance – End host effects are unavoidable • Internet2 end-to-end Performance Initiative (e2epi) has recognized this – Most useful to users. 11/7/2015 9:55 AM 12 IEPM-BW • Throughput Monitoring – Traceroute – Iperf (+quick iperf), BBftp, BBCP (mem and disk) – ABWE (available bandwidth) – GridFTP, UDPMON – Web100 – Netflow • Analysis 13 IEPM-BW Deployment • Currently 10 monitoring sites – – – – SLAC, FNAL, GATech (SOX) INFN (Milan), NIKHEF, APAN (Japan) UMich, Internet2 (Ann Arbor) UManchester, UCL (UK) • 50 unique target sites 11/7/2015 9:55 AM 14 Using IEPM-BW • Usual – Baselines – Troubleshooting – Setting expectations • Also on both testbeds and production nets – Compare measurement tools (ping vs ABwE vs. iperf/quick iperf vs bbcp vs GridFTP vs tsunami) – Compare advanced TCP stacks • Eliminate need for multiple streams – Look at non TCP bulk transfer 15 Overview • A little History • Evaluate the progress • Assess the Value • Interactions with other projects • Elements that should be added • Summary 11/7/2015 9:55 AM 16 Examples • • • • • • Long term trends Short term glitches Troubleshooting Upgrades Vacations Peering 11/7/2015 9:55 AM 17 2Mbps vacation Multiple OC12s Traffic on ESnet has doubled every year 18 12 Packet Loss between DESY and FNAL in February and March 2000. 8 6 4 DFN closes Perryman POP and looses direct peering with ESnet 2 Peering re-established via Dante at 60 Hudson February Day of the Month March 19 19 17 15 13 11 9 7 5 3 1 27 25 23 21 19 17 15 13 11 9 7 5 3 0 1 Daily Packet Loss (%) 10 Median Packet Loss Seen From nbi.dk 45 Ten-155 became operational on December 11. 40 35 To North America 30 Smurf Filters installed on NORDUnet’s US connection. 25 20 15 10 5 To Western Europe 1/24/99 1/17/99 1/10/99 1/3/99 12/27/98 12/20/98 12/13/98 12/6/98 11/29/98 11/22/98 11/15/98 11/8/98 0 11/1/98 % 100 Byte Packet Loss During Day. 50 20 21 2/17/2003 2/3/2003 50000 1/20/2003 1/6/2003 12/23/2002 12/9/2002 11/25/2002 11/11/2002 10/28/2002 10/14/2002 9/30/2002 9/16/2002 9/2/2002 8/19/2002 8/5/2002 7/22/2002 7/8/2002 6/24/2002 6/10/2002 5/27/2002 5/13/2002 250000 Throughput from SLAC to RAL between May 2002 and February 2003 200000 150000 100000 iperf bbcpmem bbcpdisk bbftp 0 22 Traffic Typically, Internet traffic is 70% http 23 Conclusions • Establish layer 3 connectivity exists • Iperf vs Quick iperf • BBftp vs BBCP => implementation • IPERF vs BBftp => CPU, Disk • Packet Loss < 0.1% • TCP/IP must be tuned on high-speed long delay paths • Web100/Net100 24 eJDS • PingER continues to be useful • Recently joined with electronic journal distribution service (eJDS) • Distribute physics journals to member around the world • Particularly concerned with quantifying the Digital Divide 11/7/2015 9:55 AM 25 Limitations • ICMP • Do not monitor routers • Rate limiting • Blocking is common, especially in developing countries • However, study indicates low impact from rate limiting • Scheduling with cron 26 Overview • A little History • Evaluate the progress • Assess the Value • Interactions with other projects • Elements that should be added • Summary 11/7/2015 9:55 AM 27 Comparison to Other Projects • • • • • • • Surveyor RIPE AMP NIMI SCNM XIWT NetPhysics 28 Comparisons • Typically results were closely correlated. • Often tools complement each other and combined provide insight into network behaviour. • Derived throughput from equation of Mathis et al (BW~MSS/(RTT*sqrt(loss)) ) shows good agreement 11/7/2015 9:55 AM 29 Publishing • Network Performance information is critical to the Grid vision – Application steering • Working with GGF/NMWG & PPDG • Monitoring data is available as prototype Web Service • OGSI Grid service under development 11/7/2015 9:55 AM 30 Internet2 PIPES • • • • E2e pi PIPES infrastructure IEPM-BW Job manager MAGGIE Analysis Engine 11/7/2015 9:55 AM 31 Available Bandwidth Estimator (ABwE) • Tool under development by SLAC/Rice – Part of the DoE/SCIDAC INCITE project • Light weight – 60 packets in 1 second – Iperf 35,000 packets/s for 10-20 seconds • No need to tune windows/streams • Replace iperf in test engine • FreeBSD version created for Abilene Backbone Measurement Infrastructure 32 Quick Iperf • Iperf is the tool of choice for many admins. – Considered accurate but intrusive. – Errors due to long slow start • Use web100 to detect end of slow start. Modify iperf client. Web100 required on client only. • Measurement within 10% • Save 94% time, 92% traffic 33 PingER-6 • SLAC has native IPv6 service from ESnet • PingER ported to IPv6 • Monitoring started in November 1999 • 41 Sites in 10 countries • edu/ac.*, net/net.*, com/co.* 11/7/2015 9:55 AM 34 PingER -vs- PingER6 400 RTT between SLAC and Purdue in Nov and Dec 1999. 350 300 250 IPv6 200 150 100 IPv4 50 31 28 27 25 23 21 19 17 15 13 11 9 7 5 3 1 29 27 25 23 21 19 17 35 15 13 11 9 7 3 1 5 11/7/2015 9:55 AM 0 Overview • • • • A little History Evaluate the progress Asses the Value Interactions with other projects • Elements that should be added • Summary 11/7/2015 9:55 AM 36 MAGGIE • Need to further develop IEPM-BW – On-demand measurements, visualization, automated trouble shooting • Measurement and Analysis for the Global Grid and Internet End-to-end performance – A secure, scalable measurement infrastructure providing measurement, analysis and access to data. 37 MAGGIE ICIR PSC Akenti IEPM-BW SLAC Measurement Engine NIMI NWS FNAL Other tools Security and scheduling MAGGIE SCIDAC Publishing ANL AMP Fault Finding Analysis Engine NMWG LBNL SLAC UCL SLAC RIPE 38 Overview • • • • • A little History Evaluate the progress Assess the Value Interactions with other projects Elements that should be added • Summary 11/7/2015 9:55 AM 39 Meeting the Objectives (1/4) Evaluate the Progress • The user cannot assume the network will be there. The problem: – Even if it is, the user cannot assume it will perform to their expectation. • • PingER has set expectation, provides data for troubleshooting, provides data for research. Continues to be useful. A Unique contribution: Probably the largest monitoring project in the world. IEPM-BW comparing tools, leveraging other efforts. The vision (realized): 40 Meeting the Objectives (2/4) Assess the value • PingER is widely used and continues to be useful. • Goals get more ambitious • Challenges remain 11/7/2015 9:55 AM 41 Meeting the Objectives (3/4) Interactions Across Projects • Long history of involvement in other projects – HENP, ESnet, Grid, High Performance, ICFA-SCIC – Friends, colleagues and contacts throughout the world (Other worlds coming soon) • Bright future for MAGGIE. 42 Meeting the Objectives (4/4) Assess the Integrated Impact • The contribution to the big picture by IEPM-PingER, IEPM-BW and especially the need for MAGGIE have been summarized by Mary-Anne and Thomas – But they may not have known it 11/7/2015 9:55 AM 43 The Big Picture All of the National Collaboratory and Network Research projects have specific goals and objectives, but all of you involved in those projects are also part of a much larger, longer term effort, namely creating an infrastructure that will enable geographically separated scientists to effectively work together as a team and that will facilitate remote access to both facilities and data. -Mary-Anne and Thomas 11/7/2015 9:55 AM 44 Toward a Monitoring Infrastructure • Certainly the need – DOE Science Community, SCIDAC Testbed – Grid, Large Scale Networking – Troubleshooting / E2Epi • Many of the ingredients – Many monitoring projects – Many tools – PIPES, MAGGIE (Cross domain) 11/7/2015 9:55 AM 45 Summary “Unfortunately, network management research has historically been very under-funded, because it is difficult to get funding bodies to recognize this as legitimate networking research.” Sally Floyd IAB Concerns & Recommendations Regarding Internet Research & Evolution. http://www.ietf.org/internet-drafts/draft-iab-research-funding-00.txt 46 Links • Accompanying paper • IEPM-BW Home • 7 papers and 35 talks in the last 12 months • ABwE 11/7/2015 9:55 AM • RIPE-TT • E2E PI • GGF NMWG • AMP TroubleShooting • Quick Iperf 47 Credits • Connie Logg, Jerrod Williams (SLAC), Jiri Navratil (CESnet/SLAC), David Martin, Frank Nagy, Al Thomas, Maxim Grigoriev (FNAL), Fabrizio Coccetti (INFN/SLAC). • Brian Tierney, Eric Boyd, Jeff Boote, Matt Zekauskas, Matt Mathis, Russ Hobby, Vern Paxson, Andy Adams, kc Claffy, Iosif Legrand, Ajay Tirumala, Tom Dunigan. • Local admins and other volunteers • DoE/MICS 48