The Performance Bottleneck Application, Computer, or Network

Download Report

Transcript The Performance Bottleneck Application, Computer, or Network

The Performance Bottleneck Application, Computer, or Network Richard Carlson eVLBI Workshop – Performance Tuning Tutorial September 17, 2006

Outline • • • Why there is a problem What can be done to find/fix problems Tools you can use

Basic Premise • Application’s performance should meet your expectations!

• If they don’t you should complain!

• But you have to complain effectively.

Questions • How many times have you said: • • What’s wrong with the network?

Why is the network so slow?

• Do you have any way to find out?

• • • Tools to check local host Tools to check local network Tools to check end-to-end path

Unfortunate Reality • Every problem, regardless of cause, exhibits the same symptom • The application performance doesn’t meet the users expectations!

Possible Bottlenecks • • • Network infrastructure Host computer/appliance Application design

Bob’s Host Simple Network Picture Network Infrastructure Carol’s Host

Network Infrastructure

Network Infrastructure Bottlenecks • • • • • Links too small • Using FastEthernet instead of Gigabit Ethernet Links congested • Too many hosts crossing this link Scenic routing • End-to-end path is longer than it needs to be Broken equipment • Bad NIC, broken wire/cable, cross-talk Administrative restrictions • Firewalls, Filters, shapers, restrictors

Host Computer Bottlenecks • CPU utilization • What else is the processor doing?

• Memory limitations • Main memory and network buffers • I/O bus speed • Getting data into and out of the NIC • Disk access speed

Application Behavior Bottlenecks • • • • Chatty protocol • Lots of short messages between peers High reliability protocol • Send packet and wait for reply before continuing No run-time tuning options • Use only default settings Blaster protocol • Ignore congestion control feedback

Problems, Problems, Problems • • Problems can exist at multiple levels • • • Network infrastructure Host computer Application design • Multiple problems can exist at the same time All problems must be found and fixed before things get better

Transport Protocols 101 • • Transmission Control Protocol (TCP) • Provides applications with a reliable in-order delivery service • The most widely used Internet transport protocol • Web, File transfers, email, P2P, Remote login User Datagram Protocol (UDP) • Provides applications with an unreliable delivery service • RTP, DVTS, DNS

Outline • • • Why there is a problem What can be done to find/fix problems Tools you can use

Remote Image Processing • • • • Carol is analyzing astronomical images. Bob needs to send a data file containing digital images (50 MB per file) to Carol every ½ hour. Bob and Carol are 2,000 miles apart. How long should each transfer take?

5 minutes?

1 minute?

5 seconds?

What should we expect?

• • Assumptions: • • 100 Mbps Fast Ethernet is the slowest link 50 msec round trip time Bob & Carol calculate: • • 50 MB * 8 = 400 Mbits 400 Mb / 100 Mb/sec = 4 seconds

Initial Test Results

Initial Test Results • • 18 Minutes!!! This is unacceptable!

First look for network infrastructure problem • Use NDT tester to examine both hosts

Initial NDT testing shows Duplex Mismatch at one end

NDT Found Duplex Mismatch • Investigating this it is found that the switch port is configured for 100 Mbps Full-Duplex operation.

• Network administrator corrects configuration and asks for re-test

Duplex Mismatch Corrected

SCP results after Duplex Mismatch Corrected

Intermediate Results • • • Time dropped from 18 minutes to 40 seconds.

Is this acceptable???

Remember your calculations said it should take 4 seconds.

• • • 400 Mb / 40 sec = 10 Mbps Why are we limited to 10 Mbps?

Are you satisfied with 1/10 th performance?

of the possible

Default TCP window size

Calculating the Window Size • • Remember Bob found the round-trip time was 50 msec Calculate window size limit • • 85.3KB * 8 b/B = 698777 b 698777 b / .050 s = 13.98 Mbps • Stated another way • • 698777 b / 100 Mb/s = 6.99 msec 43 msec of idle time every RTT

Calculating the Window Size • Calculate new window size • • (100 Mb/s * .050 s) / 8 b/B = 610.3 KB Use 8MB for testing purposes

Resetting Window Buffer

Intermediate Results • • Use application specific options to manually reset buffer size • • Fixes problem for this application Doesn’t fix problem for other applications Need better ‘default behavior’ for all applications

With TCP window size tuned

Steps so far • Found and fixed Duplex Mismatch • Network Infrastructure problem • Found and fixed TCP window size values • Host configuration problem • Are we done yet?

SCP results with auto-tuning enabled

Intermediate Results • • SCP still runs slower than expected • Hint: SSH uses internal buffers Design choice by Application Developers limit performance • Patch available from PSC

SCP Results with tuned SCP

Final Results • • • Fixed infrastructure problem Fixed host configuration problem Fixed Application configuration problem • Achieved target time of 4 seconds to transfer 50 MB file over 2000 miles

Follow-up questions • What would have happened if I tried the patched SCP version before fixing the TCP buffer problem?

• • Would not have been able to see improvement.

Discard patch because “it didn’t work”?

Why is it hard to Find/Fix Problems?

• • • Network infrastructure is complex Network infrastructure is shared Network infrastructure consists of multiple components

Shared Infrastructure • • • Other applications accessing the network • • • Remote disk access Automatic email checking Heartbeat facilities Other computers are attached to the closet switch • Uplink to facility infrastructure Other users on and off site • Uplink from facility to gigapop/backbone

Other Network Components • • • DHCP (Dynamic Host Resolution Protocol) • At least 2 packets exchanged to configure your host DNS (Domain Name Resolution) • At least 2 packets exchanged to translate FQDN into IP address • Multiple addresses require a sequential search Network Security Devices • Intrusion Detection, VPN, Firewall

Why is it hard to Find/Fix Problems?

• • Computers have multiple components Each Operating System (OS) has a unique set of tools to tune the network stack • Network Interface Cards also have tuning options • Application Appliances come with few knobs and limited options

Computer Components • • • • • Main CPU (clock speed) Front & Back side bus Main Memory I/O Bus (ATA, SCSI, SATA) Disk (access speed and size)

Computer Issues • • Lots of internal components with multi tasking OS Lots of tunable TCP/IP parameters that need to be ‘right’ for each possible connection

Why is it hard to Find/Fix Problems?

• • • Applications depend on default system settings Problems scale with distance More access to remote resources • 80/20 % rule since the early 1990’s, 80% of your traffic leaves your local network

Default System Settings • • For Linux 2.6.13 there are: • 11 tunable IP parameters • 45 tunable TCP parameters • 148 Web100 variables (TCP MIB) • Currently no OS ships with default settings that work well over trans-continental distances Some applications allow run-time setting of some options • 30 settable/viewable IP parameters • 24 settable/viewable TCP parameters • There are no standard ways to set run-time option ‘flags’

Application Issues • • Setting tunable parameters to the ‘right’ value Getting the protocol ‘right’

Outline • • • Why there is a problem What can be done to find/fix problems Tools you can use

Tools, Tools, Tools • • • • • • • • Ping Traceroute Iperf Tcpdump Tcptrace BWCTL NDT OWAMP • • • • • • • • AMP Advisor Thrulay Web100 MonaLisa pathchar NPAD Pathdiag • • • • • • • • Surveyor Ethereal CoralReef MRTG Skitter Cflowd Cricket Net100

Active Measurement Tools • • • Tools that inject packets into the network to measure some value • • • Available Bandwidth Delay/Jitter Loss May require bi-directional traffic or synchronized hosts May require running test program on both hosts

Passive Measurement Tools • Tools that monitor existing traffic on the network and extract some information • • • Bandwidth used Jitter Loss rate • May generate some privacy and/or security concerns

How do you set realistic Expectations?

• • • Assume network bandwidth exists or find out what the limits are • • Local LAN connection Site Access link Monitor the link utilization occasionally • • Weathermap MRTG graphs Look at your host config/utilization • What is the CPU utilization

Distance Matters • It’s harder to go fast over a long distance • TCP congestion control requires numerous round trips to prevent flooding network • TCP buffer limits can stop sender from injecting new data into the network • Application can exhibit poor behavior when used over long distances

Ethernet, FastEthernet, Gigabit Ethernet, 10 GE • 10/100/1000 auto-sensing NICs are common today • Most facilities have installed 10/100 switched infrastructure • Access network links are currently the limiting factor in most networks • Backbone networks are 10 Gigabit/sec

Wireless LAN’s • • • 802.11b - 11 Mbps (expect 5) 802.11a – 34 Mbps (expect 15) 802.11g – 54 Mbps (expect 25) • Expect large variations in speed due to radio signal propagation

Focus on 2 tools • • Existing NDT tool • Allows users to test network path for a limited number of common problems Emerging PerfSonar tool • Allows users to retrieve network path data from major national and international REN network

Network Diagnostic Tool (NDT) • Measure performance to users desktop • Identify real problems for real users • Network infrastructure is the problem • Host tuning issues are the problem • Make tool simple to use and understand • Make tool useful for users and network administrators • Web-based JAVA applet allows testing from any browser

Installing your own server • All Internet2 tools are FREE • • Visit http://e2epi.internet2.edu/ for details Workshops are available to help your administrator get them up and running ( http://e2epi.internet2.edu/net-perf-wkshp/ ) • • Encourage your peers to start testing Encourage your vendors to include the client programs

NPToolkit Bootable CD Knoppix based Live-CD Contains listed tools Download from Internet2 Ask for a pre-built CD-ROM http://e2epi.internet2.edu/network-performance-toolkit/network-performance-toolkit.iso

PerfSonar – Next Steps in Performance Monitoring • New Initiative involving multiple partners • • ESnet (DOE labs) GEANT (European Research and Education network) • Internet2 (Abilene and connectors) • Sample tool (Joe Metzger ESnet) https://performance.es.net/cgi-bin/perfsonar-trace.cgi

Traceroute

Visualizer

Abilene Weather Map http://loadrunner.uits.iu.edu/weathermaps/abilene/

Windows XP Performance

Google it!

• Enter “tuning tcp” into the google search engine.

• Top 2 hits are: http://www.psc.edu/networking/perf_tune.html

http://www-didc.lbl.gov/TCP-tuning/TCP-tuning.html

PSC Tuning Page

LBNL Tuning Page

Conclusions • • Applications can fully utilize the network All problems have a single symptom • All problems must be found and fixed before things get better • Some people stop investigating before finding all problems • Tools exist, and more are being developed, to make it easier to find problems