Experience with Loss-Based Congestion Controlled TCP Stacks
Download
Report
Transcript Experience with Loss-Based Congestion Controlled TCP Stacks
Experience with Loss-Based
Congestion Controlled TCP Stacks
Yee-Ting Li
University College London
Introduction
Transport of Data for next generation
applications
Network hardware is capable of Gigabits per
second
Current ‘Vanilla’ TCP not capable over long
distances and high throughputs
New TCP Stacks have been introduced to rectify
problem
Investigation into the performance, bottlenecks
and deploy-ability of new algorithms
Transmission Control Protocol
Connection orientated
Reliable Transport of Data
Window based
Congestion and Flow Control to prevent network
collapse
Provides ‘fairness’ between competing streams
20 Years old
Originally designed for kbit/sec pipes
TCP Algorithms
Based on two algorithms to determine rate at which data
is to be sent
Slowstart: probe for initial bandwidth
Congestion Avoidance: maintain a steady state transfer rate
Focus on Steady State: probe for increases in available
bandwidth, whilst backing off if congestion is detected
(through loss).
Maintained through a ‘congestion window’ cwnd that
regulates the number of unacknowledged packets
allowed on connection.
Size of window approx equals Bandwidth delay product
Determines the appropriate window size to set to obtain a
bandwidth under a certain delay
Window = Bandwidth x Delay
Algorithms
Congestion Avoidance
For every packet (ack) received by sender
For when loss is detected (through dupacks)
Cwnd cwnd + 1/cwnd
Cwnd cwnd / 2
Growth of cwnd determined by:
the RTT of the connection
The loss rate on the line
When rtt is high, cwnd grows slowly (because of acking)
High loss means that cwnd never achieved a large value
Capacity of the link
Allows for large cwnd value (when low loss)
Current Methods of Achieving High
Throughput
Advantages
Achieves good throughput
Not changes to kernels
required
Disadvantages
Have to manually tune the
number of flows
May induce extra loss on
lossy networks
Need to
reprogram/recompile
software
New TCP Stacks
Modify the congestion control algorithm to
improve response times
All based on modifying the cwnd growth and
decrease values
Define:
a = increase of data packets per window of acks
b = decrease factor upon congestion
To maintain compatibility (and hence network
stability and fairness), for small cwnd values:
Mode switch from Vanilla to New TCP
HSTCP
Designed by Sally Floyd
Determine a and b as a function of cwnd
a a(cwnd)
b b(cwnd)
Gradual improvement in throughput as we
approach larger bandwidth delay products
Current implementation focused on performance
upto 10Gb/sec – set linear relation between loss
and throughput (response function)
Scalable TCP
Designed by Tom Kelly
Define a and b to be constant:
a: cwnd cwnd + a (per ack)
b: cwnd cwnd – b x cwnd
Intrinsic scaling property that has the same
performance over any link (beyond the initial
threshold)
Recommended settings
a = 1/100
b = 1/8
H-TCP
Designed by Doug Leith and Robert Shorten
Define a mode switch so that after congestion
we do normal Vanilla
After a predefined period ∆L, switch to a high
performance a
∆i ≤ ∆L: a = 1
∆I > ∆L: a = 1 + (∆ - ∆L) + [(∆ - ∆L)/20]2
Upon loss drop by
| [Bimax(k+1) - Bimax(k)] / Bimax(k) | > 0.2: b = 0.5
Else: b = RTTmin/RTTmax
Implementation
All New Stacks have own implementation
Small differences between implementations means that
we are comparing the kernel differences rather than just
the algorithmic differences
Lead to development of ‘test platform’ kernel altAIMD
Implements all three stacks via simple sysctl switch.
Also incorporates switches for certain undesirable kernel
‘features’
moderate_cwnd()
IFQ
Added extra features for testing/evaluation purposes
Appropriate Byte Counting (RFC3465)
Inducible packet loss (at recv)
Web100 TCP logging (cwnd etc)
Networks Under Test
Networks
MB-NG
DataTAG
Junipe
r
Cisco
7600
Cisco
7600
Cisco
7600
Cisco
7600
StarLight
CERN
Bottleneck Capacity
RTT
1Gb/sec
120msec
Cisco
7600
Mancheste
r
UCL
Bottleneck Capacity
RTT
1Gb/sec
6msec
Graph/Demo
Scalable TCP
{
Vanilla TCP
{
Mode switch between stacks on constant
packet drop
{
HS-TCP
Comparison against theory
Response function
Self Similar Background Tests
Results skewed
Not comparing differences in TCP algorithms!
Not useful results!
SACK …
Look into what’s happening at the algorithmic
level:
Scalable TCP on
MB-NG with
200mbit/sec CBR
Background
Strange hiccups in cwnd only correlation is
SACK arrivals
SACKS
Supplies the sender information about what
segments the recv has
Current implementation in 2.4 and 2.6 does a
walk through the entire sack list for each SACK
Sender infers the missing packets to resend
Aids recovery during loss and prevents timeouts
Very cpu intensive
Can be interrupted by arrival of next SACK which
causes the SACK implementation to misbehave
Tests conducted with Tom Kelly’s SACK fast-path
patch
Improves SACK processing, but still not sufficient
SACK Processing overhead
Periods of web100
silence due to high
cpu utilization
Logging done in
userspace – kernel
time taken up by
tcp sack processing
TCP resets cwnd
Congestion Window Moderation
Linux TCP implementation adds ‘feature’ of
moderate_cwnd()
Idea is to prevent large bursts of data packets
under ‘dubious’ conditions
When an ACK acknowledges more than 3 packets
(typically 2)
Adjusts cwnd to known number of packets ‘inflight’ (plus extra 3 packets)
Under large cwnd sizes (high bandwidth delay
products), throughput can be diminished as
result
CPU Load and Throughput
90% TCP AF
moderate_cwnd(): Vanilla TCP
Throughput
CWND
moderate_cwnd ON
moderate_cwnd OFF
moderate_cwnd(): HS-TCP
90% TCP AF
70% TCP AF
moderate_cwnd ON
moderate_cwnd OFF
moderate_cwnd(): Scalable-TCP
90% TCP AF
70% TCP AF
moderate_cwnd ON
moderate_cwnd OFF
CoV
Aggregate BW
Multiple Streams
CoV
Aggregate BW
10 TCP Flows versus Self-Similar
Background
BG Loss per TCP BW
10 TCP Flows versus Self-Similar
Background
Impact
Fairness: ratio of throughput achieved by one stack
against another
Means that a fairness against vanilla tcp is defined by how much
more throughput a new stacks gets more than vanilla
Doesn’t really consider deploy-ability of the stacks in real life –
how does these stacks affect the existing traffic? (mostly vanilla
tcp)
Redefine fairness in terms of the Impact:
Consider the affect of the background traffic only under different
stacks
throughput of n-Vanilla flows
BW impact =
throughput of (n-1) Vanilla flows + 1 new TCP flow
Vary against number of TCP Flows to determine impact(vanilla
flows)
Throughput Impact
Throughput
Impact of 1 TCP Flow
CoV
1 New TCP Impact
Throughput Impact
Throughput
Impact of 10 TCP Flows
CoV
10 TCP Flows Impact
WAN Tests
Summary
Comparison of actual TCP differences through test
platform kernel
Problems with SACK implementations mean that it is
difficult under loss to maintain high throughput
(>500Mbit/sec)
Other problems exist with kernel implementation that
hinder performance
Compare stacks under different artificial (and hence
repeatable) conditions
Single stream:
Multiple stream:
Need to study over wider range of networks
Move tests onto real production environments