Performance Diagnostic Research at PSC Matt Mathis John Heffner Ragu Reddy 5/12/05 http://www.psc.edu/~mathis/papers/ PathDiag20050512.ppt The Wizard Gap.
Download ReportTranscript Performance Diagnostic Research at PSC Matt Mathis John Heffner Ragu Reddy 5/12/05 http://www.psc.edu/~mathis/papers/ PathDiag20050512.ppt The Wizard Gap.
Performance Diagnostic Research at PSC Matt Mathis John Heffner Ragu Reddy 5/12/05 http://www.psc.edu/~mathis/papers/ PathDiag20050512.ppt The Wizard Gap The non-experts are falling behind • • • • • • • Year 1988 1991 1995 1999 2003 2004 Why? Experts 1 Mb/s 10 Mb/s 100 Mb/s 1 Gb/s 10 Gb/s 40 Gb/s Non-experts Ratio 300 kb/s 3:1 3 Mb/s 3000:1 TCP tuning requires expert knowledge • By design TCP/IP hides the ‘net from upper layers – TCP/IP provides basic reliable data delivery – The “hour glass” between applications and networks • This is a good thing, because it allows: – Old applications to use new networks – New application to use old networks – Invisible recovery from data loss, etc • But then (nearly) all problems have the same symptom – Less than expected performance – The details are hidden from nearly everyone TCP tuning is really debugging • Application problems: – Inefficient or inappropriate application designs • Operating System or TCP problems: – Negotiated TCP features (SACK, WSCALE, etc) – Failed MTU discovery – Too small retransmission or reassembly buffers • Network problems: – – – – Packet losses, congestion, etc Packets arriving out of order or even duplicated “Scenic” IP routing or excessive round trip times Improper packet sizes limits (MTU) TCP tuning is painful debugging • All problems reduce performance – But the specific symptoms are hidden • But any one problem can prevent good performance – Completely masking all other problems • Trying to fix the weakest link of an invisible chain – General tendency is to guess and “fix” random parts – Repairs are sometimes “random walks” – Repair one problem at time at best The Web100 project • When there is a problem, just ask TCP – TCP has the ideal vantage point • In between the application and the network – TCP already “measures” key network parameters • Round Trip Time (RTT) and available data capacity • Can add more – TCP can identify the bottleneck • Why did it stop sending data? – TCP can even adjust itself • “autotuning” eliminates one major class of bugs See: www.web100.org Key Web100 components • Better instrumentation within TCP – 120 internal performance monitors – Poised to become Internet standard “MIB” • TCP Autotuning – Selects the ideal buffer sizes for TCP – Eliminate the need for user expertise • Basic network diagnostic tools – Requires less expertise than prior tools • Excellent for network admins • But still not useful for end users Web100 Status • Two year no-cost extension – Can only push standardization after most of the work – Ongoing support of research users • Partial adoption – Current Linux includes (most of) autotuning • John Heffner is maintaining patches for the rest of Web100 – Microsoft • Experimental TCP instrumentation • Working on autotuning (to support FTTH) – IBM “z/OS Communications Server” • Experimental TCP instrumentation The next step • Web100 tools still require too much expertise – They are not really end user tools – Too easy to overlook problems – Current diagnostic procedures are still cumbersome • New insight from web100 experience – Nearly all symptoms scale with round trip time • New NSF funding – Network Path and Application Diagnosis (NPAD) – 3 Years, we are at the midpoint Nearly all symptoms scale with RTT • For example – TCP Buffer Space, Network loss and reordering, etc – On a short path TCP can compensate for the flaw • Local Client to Server: all applications work – Including all standard diagnostics • Remote Client to Server: all applications fail – Leading to faulty implication of other components Examples of flaws that scale • Chatty application (e.g., 50 transactions per request) – On 1ms LAN, this adds 50ms to user response time – On 100ms WAN, this adds 5s to user response time • Fixed TCP socket buffer space (e.g., 32kBytes) – On a 1ms LAN, limit throughput to 200Mb/s – On a 100ms WAN, limit throughput to 2Mb/s • Packet Loss (e.g., 0.1% loss at 1500 bytes) – On a 1ms LAN, models predict 300 Mb/s – On a 100ms WAN, models predict 3 Mb/s Review • For nearly all network flaws – The only symptom is reduced performance – But this reduction is scaled by RTT • On short paths many flaws are undetectable – – – – False pass for even the best conventional diagnostics Leads to faulty inductive reasoning about flaw locations This is the essence of the “end-to-end” problem Current state-of-the-art diagnosis relies on tomography and complicated inference techniques Our new tool: pathdiag • Specify End-to-End application performance goal – Round Trip Time (RTT) of the full path – Desired application data rate • Measure the performance of a short path section – Use Web100 to collect detailed statistics – Loss, delay, queuing properties, etc • Use models to extrapolate results to full path – Assume that the rest of the path is ideal • Pass/Fail on the basis of the extrapolated performance Deploy as a Diagnostic Server • Use pathdiag in a Diagnostic Server (DS) in the GigaPop • Specify End to End target performance – from server (S) to client (C) (RTT and data rate) • Measure the performance from DS to C – Use Web100 in the DS to collect detailed statistics – Extrapolate performance assuming ideal backbone • Pass/Fail on the basis of extrapolated performance Example diagnostic output 1 Tester at IP address: xxx.xxx.115.170 Target at IP address: xxx.xxx.247.109 Warning: TCP connection is not using SACK Fail: Received window scale is 0, it should be 2. Diagnosis: TCP on the test target is not properly configured for this path. > See TCP tuning instructions at http://www.psc.edu/networking/perf_tune.html Pass data rate check: maximum data rate was 4.784178 Mb/s Fail: loss event rate: 0.025248% (3960 pkts between loss events) Diagnosis: there is too much background (non-congested) packet loss. The events averaged 1.750000 losses each, for a total loss rate of 0.0441836% FYI: To get 4 Mb/s with a 1448 byte MSS on a 200 ms path the total end-to-end loss budget is 0.010274% (9733 pkts between losses). Warning: could not measure queue length due to previously reported bottlenecks Diagnosis: there is a bottleneck in the tester itself or test target (e.g insufficient buffer space or too much CPU load) > Correct previously identified TCP configuration problems > Localize all path problems by testing progressively smaller sections of the full path. FYI: This path may pass with a less strenuous application: Try rate=4 Mb/s, rtt=106 ms Or if you can raise the MTU: Try rate=4 Mb/s, rtt=662 ms, mtu=9000 Some events in this run were not completely diagnosed. Example diagnostic output 2 Tester at IP address: 192.88.115.170 Target at IP address: 128.182.61.117 FYI: TCP negotiated appropriate options: WSCALE=8, SACKok, and Timestamps) Pass data rate check: maximum data rate was 94.206807 Mb/s Pass: measured loss rate 0.004471% (22364 pkts between loss events) FYI: To get 10 Mb/s with a 1448 byte MSS on a 10 ms path the total end-to-end loss budget is 0.657526% (152 pkts between losses). FYI: Measured queue size, Pkts: 33 Bytes: 47784 Drain time: 2.574205 ms Passed all tests! FYI: This path may even pass with a more strenuous application: Try rate=10 Mb/s, rtt=121 ms Try rate=94 Mb/s, rtt=12 ms Or if you can raise the MTU: Try rate=10 Mb/s, rtt=753 ms, mtu=9000 Try rate=94 Mb/s, rtt=80 ms, mtu=9000 Example diagnostic output 3 Tester at IP address: 192.88.115.170 Target at IP address: 128.2.13.174 Fail: Received window scale is 0, it should be 1. Diagnosis: TCP on the test target is not properly configured for this path. > See TCP tuning instructions at http://www.psc.edu/networking/perf_tune.html Test 1a (7 seconds): Coarse Scan Test 2a (17 seconds): Search for the knee Test 2b (10 seconds): Duplex test Test 3a (8 seconds): Accumulate loss statistics Test 4a (17 seconds): Measure static queue space The maximum data rate was 8.838274 Mb/s This is below the target rate (10.000000). Diagnosis: there seems to be a hard data rate limit > Double check the path: is it via the route and equipment that you expect? Pass: measured loss rate 0.012765% (7834 pkts between loss events) FYI: To get 10 Mb/s with a 1448 byte MSS on a 50 ms path the total end-to-end loss budget is 0.026301% (3802 pkts between losses). Key DS features • Nearly complete coverage for OS and Network flaws – Does not address flawed routing at all – May fail to detect flaws that only affect outbound data • Unless you have Web100 in the client or a (future) portable DS – May fail to detect a few rare corner cases – Eliminates all other false pass results • Tests becomes more sensitive on shorter paths – Conventional diagnostics become less sensitive – Depending on models, perhaps too sensitive • New problem is false fail (queue space tests) • Flaws no longer completely mask other flaws – A single test often detects several flaws • E.g. both OS and network flaws in the same test – They can be repaired in parallel Key features, continued • Results are specific and less geeky – Intended for end-users – Provides a list of action items to be corrected • Failed tests are showstoppers for high performance app. – Details for escalation to network or system admins • Archived results include raw data – Can reprocess with updated reporting SW The future • Current service is “pre-alpha” – Please use it so we can validate the tool • We can often tell when it got something wrong – Please report confusing results • So we can improve the reports – Please get us involved if it is non-helpful • We need interesting pathologies • Will soon have another server near FRGP – NCAR in Boulder CO • Will someday be in a position to deploy more – Should there be one at PSU? What about flaws in applications? • NPAD is also thinking about applications • Using an entirely different collection of techniques – Symptom scaling still applies • Tools to emulate ideal long paths on a LAN – Prove or bench test applications in the lab • Also checks some OS and TCP features – If it fails in the lab, it can not work on a WAN For example classic ssh & scp • Long known performance problems • Recently diagnosed – Internal flow control for port forwarding – NOT encryption • Chris Rapier developed a patch – Update flow control windows from kernel buffer size – Already running on most PSC systems See: http://www.psc.edu/networking/projects/hpn-ssh/ NPAD Goal • Build a minimal tool set that can detect “every” flaw – Pathdiag: all flaws affecting inbound data – Web100 in servers or portable diagnostic servers: All flaws affecting outbound data – Application bench test: All application flaws – Traceroute: routing flaws • We believe that this is a complete set http://kirana.psc.edu/NPAD/