Experience with Loss-Based Congestion Controlled TCP Stacks

Transcript Experience with Loss-Based Congestion Controlled TCP Stacks

Experience with Loss-Based
Congestion Controlled TCP Stacks
Yee-Ting Li
University College London
Introduction





Transport of Data for next generation
applications
Network hardware is capable of Gigabits per
second
Current ‘Vanilla’ TCP not capable over long
distances and high throughputs
New TCP Stacks have been introduced to rectify
problem
Investigation into the performance, bottlenecks
and deploy-ability of new algorithms
Transmission Control Protocol






Connection orientated
Reliable Transport of Data
Window based
Congestion and Flow Control to prevent network
collapse
Provides ‘fairness’ between competing streams
20 Years old

Originally designed for kbit/sec pipes
TCP Algorithms

Based on two algorithms to determine rate at which data
is to be sent





Slowstart: probe for initial bandwidth
Congestion Avoidance: maintain a steady state transfer rate
Focus on Steady State: probe for increases in available
bandwidth, whilst backing off if congestion is detected
(through loss).
Maintained through a ‘congestion window’ cwnd that
regulates the number of unacknowledged packets
allowed on connection.
Size of window approx equals Bandwidth delay product


Determines the appropriate window size to set to obtain a
bandwidth under a certain delay
Window = Bandwidth x Delay
Algorithms

Congestion Avoidance

For every packet (ack) received by sender


For when loss is detected (through dupacks)


Cwnd  cwnd + 1/cwnd
Cwnd  cwnd / 2
Growth of cwnd determined by:

the RTT of the connection


The loss rate on the line


When rtt is high, cwnd grows slowly (because of acking)
High loss means that cwnd never achieved a large value
Capacity of the link

Allows for large cwnd value (when low loss)
Current Methods of Achieving High
Throughput

Advantages



Achieves good throughput
Not changes to kernels
required
Disadvantages



Have to manually tune the
number of flows
May induce extra loss on
lossy networks
Need to
reprogram/recompile
software
New TCP Stacks



Modify the congestion control algorithm to
improve response times
All based on modifying the cwnd growth and
decrease values
Define:



a = increase of data packets per window of acks
b = decrease factor upon congestion
To maintain compatibility (and hence network
stability and fairness), for small cwnd values:

Mode switch from Vanilla to New TCP
HSTCP


Designed by Sally Floyd
Determine a and b as a function of cwnd




a  a(cwnd)
b  b(cwnd)
Gradual improvement in throughput as we
approach larger bandwidth delay products
Current implementation focused on performance
upto 10Gb/sec – set linear relation between loss
and throughput (response function)
Scalable TCP


Designed by Tom Kelly
Define a and b to be constant:




a: cwnd  cwnd + a (per ack)
b: cwnd  cwnd – b x cwnd
Intrinsic scaling property that has the same
performance over any link (beyond the initial
threshold)
Recommended settings


a = 1/100
b = 1/8
H-TCP



Designed by Doug Leith and Robert Shorten
Define a mode switch so that after congestion
we do normal Vanilla
After a predefined period ∆L, switch to a high
performance a



∆i ≤ ∆L: a = 1
∆I > ∆L: a = 1 + (∆ - ∆L) + [(∆ - ∆L)/20]2
Upon loss drop by


| [Bimax(k+1) - Bimax(k)] / Bimax(k) | > 0.2: b = 0.5
Else: b = RTTmin/RTTmax
Implementation





All New Stacks have own implementation
Small differences between implementations means that
we are comparing the kernel differences rather than just
the algorithmic differences
Lead to development of ‘test platform’ kernel  altAIMD
Implements all three stacks via simple sysctl switch.
Also incorporates switches for certain undesirable kernel
‘features’



moderate_cwnd()
IFQ
Added extra features for testing/evaluation purposes



Appropriate Byte Counting (RFC3465)
Inducible packet loss (at recv)
Web100 TCP logging (cwnd etc)
Networks Under Test

Networks
MB-NG
DataTAG
Junipe
r
Cisco
7600
Cisco
7600
Cisco
7600
Cisco
7600
StarLight
CERN
Bottleneck Capacity
RTT
1Gb/sec
120msec
Cisco
7600
Mancheste
r
UCL
Bottleneck Capacity
RTT
1Gb/sec
6msec
Graph/Demo
Scalable TCP
{
Vanilla TCP
{
Mode switch between stacks on constant
packet drop
{

HS-TCP
Comparison against theory

Response function
Self Similar Background Tests



Results skewed
Not comparing differences in TCP algorithms!
Not useful results!
SACK …

Look into what’s happening at the algorithmic
level:
Scalable TCP on
MB-NG with
200mbit/sec CBR
Background

Strange hiccups in cwnd  only correlation is
SACK arrivals
SACKS

Supplies the sender information about what
segments the recv has



Current implementation in 2.4 and 2.6 does a
walk through the entire sack list for each SACK



Sender infers the missing packets to resend
Aids recovery during loss and prevents timeouts
Very cpu intensive
Can be interrupted by arrival of next SACK which
causes the SACK implementation to misbehave
Tests conducted with Tom Kelly’s SACK fast-path
patch

Improves SACK processing, but still not sufficient
SACK Processing overhead



Periods of web100
silence due to high
cpu utilization
Logging done in
userspace – kernel
time taken up by
tcp sack processing
TCP resets cwnd
Congestion Window Moderation


Linux TCP implementation adds ‘feature’ of
moderate_cwnd()
Idea is to prevent large bursts of data packets
under ‘dubious’ conditions



When an ACK acknowledges more than 3 packets
(typically 2)
Adjusts cwnd to known number of packets ‘inflight’ (plus extra 3 packets)
Under large cwnd sizes (high bandwidth delay
products), throughput can be diminished as
result
CPU Load and Throughput
90% TCP AF
moderate_cwnd(): Vanilla TCP
Throughput
CWND
moderate_cwnd ON
moderate_cwnd OFF
moderate_cwnd(): HS-TCP
90% TCP AF
70% TCP AF
moderate_cwnd ON
moderate_cwnd OFF
moderate_cwnd(): Scalable-TCP
90% TCP AF
70% TCP AF
moderate_cwnd ON
moderate_cwnd OFF
CoV
Aggregate BW
Multiple Streams
CoV
Aggregate BW
10 TCP Flows versus Self-Similar
Background
BG Loss per TCP BW
10 TCP Flows versus Self-Similar
Background
Impact

Fairness: ratio of throughput achieved by one stack
against another



Means that a fairness against vanilla tcp is defined by how much
more throughput a new stacks gets more than vanilla
Doesn’t really consider deploy-ability of the stacks in real life –
how does these stacks affect the existing traffic? (mostly vanilla
tcp)
Redefine fairness in terms of the Impact:

Consider the affect of the background traffic only under different
stacks
throughput of n-Vanilla flows
BW impact =
throughput of (n-1) Vanilla flows + 1 new TCP flow

Vary against number of TCP Flows to determine impact(vanilla
flows)
Throughput Impact
Throughput
Impact of 1 TCP Flow
CoV
1 New TCP Impact
Throughput Impact
Throughput
Impact of 10 TCP Flows
CoV
10 TCP Flows Impact
WAN Tests
Summary




Comparison of actual TCP differences through test
platform kernel
Problems with SACK implementations mean that it is
difficult under loss to maintain high throughput
(>500Mbit/sec)
Other problems exist with kernel implementation that
hinder performance
Compare stacks under different artificial (and hence
repeatable) conditions




Single stream:
Multiple stream:
Need to study over wider range of networks
Move tests onto real production environments

Experience with Loss-Based Congestion Controlled TCP Stacks

Transcript Experience with Loss-Based Congestion Controlled TCP Stacks

Directory