Transcript The Performance Bottleneck Application, Computer, or Network
The Performance Bottleneck Application, Computer, or Network Richard Carlson
Outline • • • Why there is a problem What can be done to find/fix problems Tools you can use
Basic Premise • Application’s performance should meet your expectations!
• If they don’t you should complain!
• But you have to complain effectively.
Questions • How many times have you said: • • What’s wrong with the network?
Why is the network so slow?
• Do you have any way to find out?
• • • Tools to check local host Tools to check local network Tools to check end-to-end path
Unfortunate Reality • Every problem, regardless of cause, exhibits the same symptom • The application performance doesn’t meet the users expectations!
Possible Bottlenecks • • • Network infrastructure Host computer/appliance Application design
Bob’s Host Simple Network Picture Network Infrastructure Carol’s Host
Network Infrastructure
Network Infrastructure Bottlenecks • • • • • Links too small • Using FastEthernet instead of Gigabit Ethernet Links congested • Too many hosts crossing this link Scenic routing • End-to-end path is longer than it needs to be Broken equipment • Bad NIC, broken wire/cable, cross-talk Administrative restrictions • Firewalls, Filters, shapers, restrictors
Host Computer Bottlenecks • CPU utilization • What else is the processor doing?
• Memory limitations • Main memory and network buffers • I/O bus speed • Getting data into and out of the NIC • Disk access speed
Application Behavior Bottlenecks • • • • Chatty protocol • Lots of short messages between peers High reliability protocol • Send packet and wait for reply before continuing No run-time tuning options • Use only default settings Blaster protocol • Ignore congestion control feedback
Problems, Problems, Problems • • Problems can exist at multiple levels • • • Network infrastructure Host computer Application design • Multiple problems can exist at the same time All problems must be found and fixed before things get better
Transport Protocols 101 • • Transmission Control Protocol (TCP) • Provides applications with a reliable in-order delivery service • The most widely used Internet transport protocol • Web, File transfers, email, P2P, Remote login User Datagram Protocol (UDP) • Provides applications with an unreliable delivery service • RTP, DVTS, DNS
Outline • • • Why there is a problem What can be done to find/fix problems Tools you can use
Remote Image Processing • • • • Carol is analyzing astronomical images. Bob needs to send a data file containing digital images (50 MB per file) to Carol every ½ hour. Bob and Carol are 2,000 miles apart. How long should each transfer take?
5 minutes?
1 minute?
5 seconds?
What should we expect?
• • Assumptions: • • 100 Mbps Fast Ethernet is the slowest link 50 msec round trip time Bob & Carol calculate: • • 50 MB * 8 = 400 Mbits 400 Mb / 100 Mb/sec = 4 seconds
Initial Test Results
Initial Test Results • • 18 Minutes!!! This is unacceptable!
First look for network infrastructure problem • Use NDT tester to examine both hosts
Initial NDT testing shows Duplex Mismatch at one end
NDT Found Duplex Mismatch • Investigating this it is found that the switch port is configured for 100 Mbps Full-Duplex operation.
• Network administrator corrects configuration and asks for re-test
Duplex Mismatch Corrected
SCP results after Duplex Mismatch Corrected
Intermediate Results • • • Time dropped from 18 minutes to 40 seconds.
Is this acceptable???
Remember your calculations said it should take 4 seconds.
• • • 400 Mb / 40 sec = 10 Mbps Why are we limited to 10 Mbps?
Are you satisfied with 1/10 th performance?
of the possible
Default TCP window size
Calculating the Window Size • • Remember Bob found the round-trip time was 50 msec Calculate window size limit • • 85.3KB * 8 b/B = 698777 b 698777 b / .050 s = 13.98 Mbps • Stated another way • • 698777 b / 100 Mb/s = 6.99 msec 43 msec of idle time every RTT
Calculating the Window Size • Calculate new window size • • (100 Mb/s * .050 s) / 8 b/B = 610.3 KB Use 8MB for testing purposes
Resetting Window Buffer
Intermediate Results • • Use application specific options to manually reset buffer size • • Fixes problem for this application Doesn’t fix problem for other applications Need better ‘default behavior’ for all applications
With TCP window size tuned
Steps so far • Found and fixed Duplex Mismatch • Network Infrastructure problem • Found and fixed TCP window size values • Host configuration problem • Are we done yet?
SCP results with auto-tuning enabled
Intermediate Results • • SCP still runs slower than expected • Hint: SSH uses internal buffers Design choice by Application Developers limit performance • Patch available from PSC
SCP Results with tuned SCP
Final Results • • • Fixed infrastructure problem Fixed host configuration problem Fixed Application configuration problem • Achieved target time of 4 seconds to transfer 50 MB file over 2000 miles
Follow-up questions • What would have happened if I tried the patched SCP version before fixing the TCP buffer problem?
• • Would not have been able to see improvement.
Discard patch because “it didn’t work”?
Why is it hard to Find/Fix Problems?
• • • Network infrastructure is complex Network infrastructure is shared Network infrastructure consists of multiple components
Shared Infrastructure • • • Other applications accessing the network • • • Remote disk access Automatic email checking Heartbeat facilities Other computers are attached to the closet switch • Uplink to facility infrastructure Other users on and off site • Uplink from facility to gigapop/backbone
Other Network Components • • • DHCP (Dynamic Host Resolution Protocol) • At least 2 packets exchanged to configure your host DNS (Domain Name Resolution) • At least 2 packets exchanged to translate FQDN into IP address • Multiple addresses require a sequential search Network Security Devices • Intrusion Detection, VPN, Firewall
Why is it hard to Find/Fix Problems?
• • Computers have multiple components Each Operating System (OS) has a unique set of tools to tune the network stack • Network Interface Cards also have tuning options • Application Appliances come with few knobs and limited options
Computer Components • • • • • Main CPU (clock speed) Front & Back side bus Main Memory I/O Bus (ATA, SCSI, SATA) Disk (access speed and size)
Computer Issues • • Lots of internal components with multi tasking OS Lots of tunable TCP/IP parameters that need to be ‘right’ for each possible connection
Why is it hard to Find/Fix Problems?
• • • Applications depend on default system settings Problems scale with distance More access to remote resources • 80/20 % rule since the early 1990’s, 80% of your traffic leaves your local network
Default System Settings • • For Linux 2.6.13 there are: • 11 tunable IP parameters • 45 tunable TCP parameters • 148 Web100 variables (TCP MIB) • Currently no OS ships with default settings that work well over trans-continental distances Some applications allow run-time setting of some options • 30 settable/viewable IP parameters • 24 settable/viewable TCP parameters • There are no standard ways to set run-time option ‘flags’
Application Issues • • Setting tunable parameters to the ‘right’ value Getting the protocol ‘right’
Outline • • • Why there is a problem What can be done to find/fix problems Tools you can use
Tools, Tools, Tools • • • • • • • • Ping Traceroute Iperf Tcpdump Tcptrace BWCTL NDT OWAMP • • • • • • • • AMP Advisor Thrulay Web100 MonaLisa pathchar NPAD Pathdiag • • • • • • • • Surveyor Ethereal CoralReef MRTG Skitter Cflowd Cricket Net100
Active Measurement Tools • • • Tools that inject packets into the network to measure some value • • • Available Bandwidth Delay/Jitter Loss May require bi-directional traffic or synchronized hosts May require running test program on both hosts
Passive Measurement Tools • Tools that monitor existing traffic on the network and extract some information • • • Bandwidth used Jitter Loss rate • May generate some privacy and/or security concerns
How do you set realistic Expectations?
• • • Assume network bandwidth exists or find out what the limits are • • Local LAN connection Site Access link Monitor the link utilization occasionally • • Weathermap MRTG graphs Look at your host config/utilization • What is the CPU utilization
Distance Matters • It’s harder to go fast over a long distance • TCP congestion control requires numerous round trips to prevent flooding network • TCP buffer limits can stop sender from injecting new data into the network • Application can exhibit poor behavior when used over long distances
Ethernet, FastEthernet, Gigabit Ethernet, 10 GE • 10/100/1000 auto-sensing NICs are common today • Most facilities have installed 10/100 switched infrastructure • Access network links are currently the limiting factor in most networks • Backbone networks are 10 Gigabit/sec
Wireless LAN’s • • • 802.11b - 11 Mbps (expect 5) 802.11a – 34 Mbps (expect 15) 802.11g – 54 Mbps (expect 25) • Expect large variations in speed due to radio signal propagation
Focus on 2 tools • • Existing NDT tool • Allows users to test network path for a limited number of common problems Emerging PerfSonar tool • Allows users to retrieve network path data from major national and international REN network
Network Diagnostic Tool (NDT) • Measure performance to users desktop • Identify real problems for real users • Network infrastructure is the problem • Host tuning issues are the problem • Make tool simple to use and understand • Make tool useful for users and network administrators • Web-based JAVA applet allows testing from any browser
Installing your own server • All Internet2 tools are FREE • • Visit http://e2epi.internet2.edu/ for details Workshops are available to help your administrator get them up and running ( http://e2epi.internet2.edu/net-perf-wkshp/ ) • • Encourage your peers to start testing Encourage your vendors to include the client programs
NPToolkit Bootable CD Knoppix based Live-CD Contains listed tools Download from Internet2 Ask for a pre-built CD-ROM http://e2epi.internet2.edu/network-performance-toolkit/network-performance-toolkit.iso
PerfSonar – Next Steps in Performance Monitoring • New Initiative involving multiple partners • • ESnet (DOE labs) GEANT (European Research and Education network) • Internet2 (Abilene and connectors) • Sample tool (Joe Metzger ESnet) https://performance.es.net/cgi-bin/perfsonar-trace.cgi
Traceroute
Visualizer
Abilene Weather Map http://loadrunner.uits.iu.edu/weathermaps/abilene/
Windows XP Performance
Google it!
• Enter “tuning tcp” into the google search engine.
• Top 2 hits are: http://www.psc.edu/networking/perf_tune.html
http://www-didc.lbl.gov/TCP-tuning/TCP-tuning.html
PSC Tuning Page
LBNL Tuning Page
Conclusions • • Applications can fully utilize the network All problems have a single symptom • All problems must be found and fixed before things get better • Some people stop investigating before finding all problems • Tools exist, and more are being developed, to make it easier to find problems