Transcript Document
Congestion control
Lecture 6
CS 653
Why congestion control?
Causes/costs of congestion: scenario 1
two senders, two
receivers
one router, infinite
buffers
no retransmission
Host A
Host B
lout
lin : original data
unlimited shared
output link buffers
large delays
when congested
throughput
staurates
Causes/costs of congestion: scenario 2
one router, finite buffers
sender retransmission of lost
packet
Host A
Host B
lin : original
data
l'in : original data, plus
retransmitted data
finite shared output
link buffers
lout
Causes/costs of congestion: scenario 2
(goodput)
= l
out
in
“perfect” retransmission when only loss
always:
l
l > lout
in
retransmission of delayed (not lost) packet makes l
in
(than perfect case) for same l
out
R/2
R/2
larger
R/2
lin
a.
R/2
lout
lout
lout
R/3
lin
b.
R/2
R/4
lin
R/2
c.
“costs” of congestion:
more work (retransmission) for given “goodput”
unneeded retransmissions: link carries multiple copies of pkt
Causes/costs of congestion: scenario 3
four senders
multihop paths
timeout/retransmit
Q: what happens as l
in
and l increase ?
in
Host A
lin : original data
l'in : original data, plus
retransmitted data
finite shared output
link buffers
Host B
lout
Causes/costs of congestion: scenario 3
H
o
s
t
A
l
o
u
t
H
o
s
t
B
Another “cost” of congestion:
when packet dropped, any “upstream
transmission capacity used for that packet
was wasted!
Two broad approaches towards congestion control
End-end congestion
control:
Network-assisted
congestion control:
no explicit feedback from routers provide
feedback to endhosts
network
single bit indicating
congestion inferred from
congestion (SNA,
end-system observed
DECbit, ATM, TCP/IP
ECN)
loss, delay
explicit rate sender
approach taken by TCP
should send
recent proposals [XCP]
[RCP] revisit ATM ideas
TCP congestion control
Components of TCP congestion control
Slow start
Multiplicatively increase (double) window
Congestion avoidance
Additively increase (by 1 MSS) window
Loss
Multiplicatively decrease (halve) window
Timeout
Set cwnd to 1 MSS
Multiplicatively increase (double) retransmission
timeout upon each further consecutive loss
Retransmission timeout estimation
Calculate EstimatedRTT using moving
average
EstimatedRTTi = (1- )*EstimatedRTTi-1 + *SampleRTTi
Calculate deviation wrt moving average
DevRTTi = (1-)*DevRTTi-1 +
*|SampleRTTi-EstimatedRTTi-1|
Timeout = EstimatedRTT + 4*DevRTT
TCP Throughput
TCP throughput: A very very simple
model
What’s the average throughout of TCP as
a function of window size and RTT T ?
Ignore
slow start
Let W be the window size when loss occurs.
When window is W, throughput is W/T
Just after loss, window drops to W/2,
throughput to W/2T
Average throughput: 3W/4T
TCP throughput: A very simple model
But what is W when loss occurs?
C = link capacity
in packets/sec
Q = queue capacity
in number of packets
When window is w and queue has q packets, TCP is
sending at rate w/(T+q/C)
For maintaining utilization and steady state
Just before loss, rate = W/(T+Q/C) = C
Just after loss, rate = W/2T = C
For Q = CT (a common thumbrule to set router buffer
sizes), a loss occurs every ¼ (3/4W)Q = 3W2/8 packets
Deriving TCP throughput/loss relationship
# packets sent per “period” =
W /2
W W
W
1 ... W ( n)
2 2
n 0 2
W
TCP
window
size
W
W W /2
1 n
2
2 n 0
W/2
period
time (rtt)
W
W W / 2(W / 2 1)
1
2
2
2
3
3
W2 W
8
4
3
W2
8
Deriving TCP throughput/loss relationship
3 2
# packets sent per “period” W
8
W
1 packet lost per “period” implies:
TCP
window
size
ploss
8
or: W
2
3W
8
3 ploss
W/2
period
time (rtt)
3 packets
B avg._thrup ut W
4
rtt
1.22 packets
B avg._thruput
rtt
ploss
Alternate fluid model
Rate of change of sending rate = term
inversely proportional to current rate with
probability (1-p) - term proportional to
current rate with probability p
In steady state,
TCP throughput: A better loss rate
based “simple” model [PFTK]
With many flows, loss rate and delay are
not affected much by a single TCP flow
TCP
behavior completely specified by loss
and delay pattern along path (bounded by
bottleneck capacity)
Given loss rate p and delay T what is TCP’s
throughput B packets/sec taking timeouts
into account?
What is PFTK modeling?
Independent loss probability p across
rounds
Loss
´ triple duplicate acks
Bursty loss in a round: if some packet lost,
all following packets in that round also lost
Timeout if < three duplicate acks received
PFTK empirical validation: Low loss
PFTK empirical validation: High loss
Loss-based TCP
Evolution of loss-based TCP
Tahoe
(without fast retransmit)
Reno (triple duplicate acks + fast
retransmit)
NewReno (Reno + handling multiple losses
better)
SACK (selective acknowledgment) common
today
Q: what if loss not due to congestion?
Delay-based TCP Vegas
Uses delay as a signal of congestion
Idea:
try to keep a small constant number of
packets at bottleneck queue
Expected = W/BaseRTT
Actual = W/CurRTT
Diff = Expected - Actual
Try to keep Diff between fixed 1 and 3
More recent FAST TCP based on Vegas.
Delay-based
TCP not widely used today
TCP-Friendliness
Can we try MyFavNew TCP?
Well, is it TCP-friendly?
Any alternative congestion control scheme needs
to coexist with TCP in FIFO queues in the besteffort Internet, or be isolated from TCP.
To co-exist with TCP, it must impose the same
long-term load on the network:
No greater long-term throughput as a function of
packet loss and delay so TCP doesn't suffer
Not significantly less long-term throughput or it's
not too useful
TCP friendly rate control (TFRC)
Use a model of TCP's throughout as a
function of the loss rate and RTT directly in
a congestion control algorithm.
If
transmission rate is higher than that
given by the model, reduce the transmission
rate to the model's rate.
Otherwise increase the transmission rate.
Eg, DCCP (Datagram Congestion Control
Protocol), for unreliable congestion control
Q: how to measure/use loss rate and RTT?
High speed TCP
TCP in high speed networks
Example: 1500 byte segments, 100ms RTT, want 10 Gbps
throughput
Requires window size W = 83,333 in-flight segments
Throughput in terms of loss rate:
➜ p = 2·10-10 or equivalently at most one drop every
couple hours!
New versions of TCP for high-speed networks needed!
TCP’s long recovery delay
More than an hour to recover from a loss
or timeout
~41,000 packets
~60,000 RTTs
~100 minutes
High-speed TCP
Proposals
Scalable
TCP, HSTCP, FAST, CUBIC
General idea is to use superlinear window
increase
Particularly useful in high bandwidth-delay
product regimes
Alternate choices of response functions
Scalable TCP - S = 0.15/p
Q: Whatever happened to TCP-friendly?
High speed TCP [Floyd]
additive increase,
multiplicative decrease
increments, decrements
depend on window size
Scalable TCP (STCP) [T. Kelly]
multiplicative increase, multiplicative
decrease
WW+a
WW–bW
per ACK
per window with loss
STCP dynamics
From 1st PFLDnet Workshop, Tom Kelly
Active Queue Management
Router Queue Management
normally, packets dropped only when queue overflows
“drop-tail” queueing
P6 P5 P4 P3 P2 P1
ISP
router
Internet
ISP
router
FCFS
Scheduler
The case against drop-tail queue management
P6 P5 P4 P3 P2 P1
FCFS
Scheduler
Large queues in routers are “a bad thing”
Delay: end-to-end latency dominated by length
of queues at switches in network
Allowing queues to overflow is “a bad thing”
Fairness: connections transmitting at high
rates can starve connections transmitting at
low rates
Utilization: connections can synchronize their
response to congestion
Idea: early random packet drop
P6 P5 P4 P3 P2 P1
FCFS
Scheduler
When queue length exceeds threshold, drop
packets with queue length dependent
probability
probabilistic packet drop: flows see same loss
problem: bursty traffic (burst arrives when
queue is near threshold) can be over penalized
rate
Random early detection (RED) packet drop
Average queue length
Max
queue length
Drop probability
Forced drop
Max
threshold
Probabilistic
early drop
Min
threshold
No drop
Time
Use exponential average of queue length to
determine when to drop
avoid overly penalizing short-term bursts
react to longer term trends
Tie drop prob. to weighted avg. queue length
avoids over-reaction to mild overload conditions
Random early detection (RED) packet drop
Average queue length
Max
queue length
Drop probability
Forced drop
Max
threshold
Probabilistic
early drop
Min
threshold
No drop
Time
Drop probability
100%
maxp
min
max
Weighted Average
Queue Length
RED summary: why random drop?
Provide gentle transition from no-drop to
all-drop
Provide
“gentle” early warning
Avoid synchronized loss bursts among
sources
Provide same loss rate to all sessions:
With
tail-drop, low-sending-rate sessions
can be completely starved
Random early detection (RED) today
Many (5) parameters: nontrivial to
tune (at least for HTTP traffic)
Gains over drop-tail FCFS not that
significant
Still not widely deployed …
Why randomization important?
Synchronization of periodic routing updates
Periodic losses observed in end-end Internet
traffic
source: Floyd,
Jacobson 1994
Router update operation:
time spent in state
depends on msgs
received from others
(weak coupling
between routers
processing)
timeout,
or link fail
update
prepare
own routing
update
receive update from neighbor
process (time: TC2)
(time: TC)
<ready>
send update (time: Td to arrive at dest)
start_timer (uniform: Tp +/- Tr)
wait
receive update from neighbor
process
Router synchronization
20 (simulated)
routers broadcasting
updates to each other
x-axis: time until
routing update sent
relative to start of
round
By t=100,000 all
router rounds are of
length 120!
synchronization or
lack thereof depends
on system parameters
Avoiding synchronization
Choose random
timer component,
Tr large (e.g.,
several multiples
of TC)
Add enough
randomization
to avoid
synchronization
receive update from neighbor
process (time: TC2)
prepare
own routing
update
(time: TC)
<ready>
send update (time: Td to arrive)
start_timer (uniform: Tp +/- Tr)
wait
receive update from neighbor
process
Randomization
Takeaway message:
randomization
robust
makes a system simple and
Background transport: TCP Nice
What are background transfers?
Data that humans are not waiting for
Non-deadline-critical
Unlimited demand
Examples
Prefetched traffic on the Web
File system backup
Large-scale data distribution services
Background software updates
Media file sharing
Desired Properties
Utilization of spare network capacity
No interference with regular transfers
Self-interference
• applications hurt their own performance
Cross-interference
• applications hurt other applications’ performance
TCP Nice
Goal: abstraction of free infinite bandwidth
Applications say what they want
OS manages resources and scheduling
Self tuning transport layer
Reduces risk of interference with foreground
traffic
Significant utilization of spare capacity by
background traffic
Simplifies application design
Why change TCP?
TCP does network resource management
Need flow prioritization
Alternative: router prioritization
+ More responsive, simple one bit priority
Hard to deploy
Question:
Can end-to-end congestion control achieve noninterference and utilization?
TCP Nice
Proactively detects congestion
Uses increasing RTT as congestion signal
Congestion incr. queue lengths incr. RTT
Aggressive responsiveness to congestion
Only modifies sender-side congestion control
Receiver and network unchanged
TCP friendly
TCP Nice
Basic algorithm
1. Early Detection thresh. queue length incr. in RTT
2. Multiplicative decrease on early congestion
3. Allow cwnd < 1.0 (despite no loss)
per-ack operation:
if(curRTT > minRTT + threshold*(maxRTT – minRTT))
numCong++;
per-round operation:
if(numCong > f.W)
W W/2
else { … AIMD congestion control }
Nice: the works
Reno Add *
Add *
Nice Add *
Add *
Mul +
Mul +
Mul +
t.B
m
pkts
B
minRTT = t
maxRTT = tB/m
Non-interference getting out of the way in time
Utilization maintaining a small queue
Foreground Document Latency (sec)
Network Conditions
1e3
100
10
V0
Reno
Nice
1
Vegas
Router Prio
0.1
1
10
Spare Capacity
100
Nice causes low interference to foreground Web traffic
even when there isn’t much spare capacity.
Scalability
1e3
Document Latency (sec)
Vegas
100
Reno
10
V0
Nice
1
Router Prio
0.1
1
10
Num BG flows
W < 1 allows Nice to scale to any number of
background flows
100
Utilization
8e4
Vegas
BG Throughput (KB)
Reno
6e4
Router Prio
4e4
V0
Nice
2e4
0
1
10
Num BG flows
100
Nice utilizes 50-80% of spare capacity w/o stealing
any bandwidth from FG
Wide-area network experiments
What is TCP optimizing?
How does TCP allocate network
resources?
Problem: Given a network and some number of
long-lived TCP connections between different
source-destination routes, can we model the
resulting resource allocation?
How to model the interaction between TCP and
the network?
Recall: PFTK like models assumed network
conditions are not affected by (a single) TCP flow
Optimization-based approach towards congestion
control
Resource allocation as optimization problem:
How to allocate resources (e.g., bandwidth) to
optimize some objective function
Maybe not possible to obtain exact optimality but..
optimization framework as means to explicitly
steer network towards desirable operating point
practical congestion control as distributed
asynchronous implementations of optimization
algorithm
systematic approach towards protocol design
Model
Network: Links l each of capacity cl
Sources s: (L(s), Us(xs))
L(s) - links used by source s
Us(xs) - utility if source rate = xs
example utility
function for elastic
application
Us(xs)
xs
x1
x1 x2 c1
x1 x3 c2
c1
x2
c2
x3
Q: What are possible allocations with say unit capacity links?
Optimization Problem
max
xs 0
U
s
subject t o
s
( xs )
x
sS ( l )
s
cl , l L
“system” problem
maximize system utility (note: all sources “equal”)
constraint: bandwidth used less than capacity
centralized solution to optimization impractical
must know all utility functions
impractical for large number of sources
can we view congestion control as distributed
asynchronous algorithms to solve this problem?
The user view
User can choose amount to pay per unit time, ws
Would like allocated bandwidth, xs in proportion to ws
ws
xs
ps
ps could be viewed as charge per unit flow for user s
w s
Us w s
max
ps
subject to ws 0
user’s utility
user problem
cost
The network view
Suppose network knows vector {ws}, chosen by users
Network wants to maximize logarithmic utility function
max
xs 0
w
s
s
subject t o
log xs
x
sS(l)
network problem
s
cl
Solution existence
There exist prices, ps,
source rates, xs, and
amount-to-pay-per-unittime, ws = psxs such that
{Ws} solves user
problem
{Xs} solves the
network problem
{Xs} is the unique
solution to the system
problem
ws
ws
Us
max
ps
subject t o ws 0
w
max
s
xs 0
s
subject to
log xs
x
sS(l)
max
xs 0
U
s
subject t o
s
s
cl
( xs )
x
sS ( l )
s
cl , l L
Proportional Fairness
Vector of rates, {xs}, proportionally fair if
feasible and for any other feasible vector {xs*}:
x xs
0
xs
sS
*
s
Result: if wr=1, then {Xs} solves the
network problem IFF it is proportionally
fair
Similar result exists for the case that wr
not equal 1.
Max-min Fairness
Rates {xr} max-min fair if for any other
feasible rates {yr}, if ys > xs, then p,
such that xp xs and yp < xp
Minimum potential delay fairness
Rates {xr} are minimum potential delay fair
if Ur (xr) = -wr/xr
Interpretation: if wr is file size, then wr/xr
is transfer time; optimization problem is
to minimize sum of transfer delays
Max-min Fairness
rates {xr} max-min fair if for any other
feasible rates {yr}, if ys > xs, then p, such
that xp xs and yp < xp
What is corresponding utility function?
1
r
x
U r ( xr ) lim
1
Solving the network problem
Results so far: existence - solution exists
with given properties
How to compute solution?
Ideally:
distributed solution easily embodied
in protocol
Should reveal insight into existing protocol
Solving the network problem
d
xs (t ) k ws xs (t ) pl (t )
dt
lL ( s )
multiplicative
change in linear
bandwidth increase decrease
allocation at s
where
pl (t ) g l xs (t )
lL ( s )
congestion “signal”: function of aggregate rate
at link l, fed back to s.
Solving the network problem
d
xs (t ) k ws xs (t ) pl (t )
dt
lL ( s )
Results:
* converges to solution of relaxation of network
problem
xs(t)Spl(t) converges to ws
Interpretation: TCP-like algorithm to iteratively solves
optimal rate allocation!
Source Algorithm
Source needs only its path price:
xÝr kr (x r )(U r '(x r ) qr )
kr() nonnegative nondecreasing function
Above algorithm converges to unique
solution for any initial condition
qr interpreted as loss/marking probability
Proportionally-Fair Controller
If utility function is
then a controller that implements it is
given by
Pricing interpretation
Can network choose pricing scheme to
achieve fair resource allocation?
Suppose network charges price qr ($/bit)
where qr= pl
User’s strategy: spend wr ($/sec.) to
maximize
Optimal User Strategy
equivalently,
Simplified TCP-Reno
suppose
x
then,
2(1 p)
T p
2
T p
1
U ( x)
Tx
interpretation: minimize (weighted) delay
Is AIMD special?
Consider a window control as follows
cwnd
+= a*cwnd^n, when no loss
cwnd -= b*cwnd^m when loss
where n<m
Expected change in congestion window
Expected change in rate per unit time
MIMD (n,m)
Consider the controller
where
Then, at equilibrium
Where α = m-n. For stability
Motivation
Congestion Control:
maximize user
utility
Given routing Rli
how to adapt end
rate xi?
Traffic Engineering:
minimize network
congestion
Given traffic xi
how to perform
routing Rli?
Congestion Control Model
Users are indexed by i
aggregate utility
Utility
Ui(xi)
max. ∑ i Ui(xi)
s.t. ∑i Rlixi ≤ cl
var. x
capacity
constraints
Congestion control provides fair
rate allocation amongst users
Source rate xi
Traffic Engineering Model
Links are indexed by l
aggregate cost
Cost
f(ul)
ul = 1
Link Utilization ul
min. ∑l f(ul)
s.t.
ul =∑i Rlixi/cl
var. R
Traffic engineering avoids
bottlenecks in the network
Model of Internet Reality
Congestion Control:
max ∑i Ui(xi),
s.t. ∑i Rlixi ≤ cl
xi
Rli
Traffic Engineering:
min ∑l f(ul),
s.t. ul =∑i Rlixi/cl
System Properties
Convergence
Does it achieve some objective?
Benchmark:
max. ∑i Ui(xi)
s.t. Rx ≤ c
Var. x, R
Utility gap between the joint system
and benchmark
Multipath TCP
Joint routing and congestion control
Multipath TCP controller