Document

Transcript Document

Congestion control
Lecture 6
CS 653
Why congestion control?
Causes/costs of congestion: scenario 1
 two senders, two
receivers
 one router, infinite
buffers
 no retransmission
Host A
Host B
lout
lin : original data
unlimited shared
output link buffers
 large delays
when congested
 throughput
staurates
Causes/costs of congestion: scenario 2
 one router, finite buffers
 sender retransmission of lost
packet
Host A
Host B
lin : original
data
l'in : original data, plus
retransmitted data
finite shared output
link buffers
lout
Causes/costs of congestion: scenario 2
(goodput)
= l
out
in
 “perfect” retransmission when only loss
 always:

l
l > lout
in
retransmission of delayed (not lost) packet makes l
in
(than perfect case) for same l
out
R/2
R/2
larger
R/2
lin
a.
R/2
lout
lout
lout
R/3
lin
b.
R/2
R/4
lin
R/2
c.
“costs” of congestion:
 more work (retransmission) for given “goodput”
 unneeded retransmissions: link carries multiple copies of pkt
Causes/costs of congestion: scenario 3
 four senders
 multihop paths
 timeout/retransmit
Q: what happens as l
in
and l increase ?
in
Host A
lin : original data
l'in : original data, plus
retransmitted data
finite shared output
link buffers
Host B
lout
Causes/costs of congestion: scenario 3
H
o
s
t
A
l
o
u
t
H
o
s
t
B
Another “cost” of congestion:
 when packet dropped, any “upstream
transmission capacity used for that packet
was wasted!
Two broad approaches towards congestion control
End-end congestion
control:
Network-assisted
congestion control:
 no explicit feedback from  routers provide
feedback to endhosts
network
 single bit indicating
 congestion inferred from
congestion (SNA,
end-system observed
DECbit, ATM, TCP/IP
ECN)
loss, delay
 explicit rate sender
 approach taken by TCP
should send

recent proposals [XCP]
[RCP] revisit ATM ideas
TCP congestion control
Components of TCP congestion control
 Slow start

Multiplicatively increase (double) window
 Congestion avoidance

Additively increase (by 1 MSS) window
 Loss

Multiplicatively decrease (halve) window
 Timeout
Set cwnd to 1 MSS
 Multiplicatively increase (double) retransmission
timeout upon each further consecutive loss

Retransmission timeout estimation
 Calculate EstimatedRTT using moving
average
EstimatedRTTi = (1- )*EstimatedRTTi-1 + *SampleRTTi
 Calculate deviation wrt moving average
DevRTTi = (1-)*DevRTTi-1 +
*|SampleRTTi-EstimatedRTTi-1|
 Timeout = EstimatedRTT + 4*DevRTT
TCP Throughput
TCP throughput: A very very simple
model
 What’s the average throughout of TCP as
a function of window size and RTT T ?
 Ignore
slow start
 Let W be the window size when loss occurs.
 When window is W, throughput is W/T
 Just after loss, window drops to W/2,
throughput to W/2T
 Average throughput: 3W/4T
TCP throughput: A very simple model
 But what is W when loss occurs?
C = link capacity
in packets/sec
Q = queue capacity

in number of packets
 When window is w and queue has q packets, TCP is
sending at rate w/(T+q/C)
 For maintaining utilization and steady state
 Just before loss, rate = W/(T+Q/C) = C
 Just after loss, rate = W/2T = C
 For Q = CT (a common thumbrule to set router buffer
sizes), a loss occurs every ¼ (3/4W)Q = 3W2/8 packets
Deriving TCP throughput/loss relationship
# packets sent per “period” =
W /2
W W
W

   1  ...  W   (  n)
2 2

n 0 2
W
TCP
window
size
W
 W W /2
   1   n
2
 2 n 0
W/2
period
time (rtt)
W
 W W / 2(W / 2  1)
   1 
2
2
 2
3
3
 W2  W
8
4
3
 W2
8
Deriving TCP throughput/loss relationship
3 2
# packets sent per “period”  W
8
W
1 packet lost per “period” implies:
TCP
window
size
ploss 
8
or: W 
2
3W
8
3 ploss
W/2
period
time (rtt)
3 packets
B  avg._thrup ut  W
4
rtt
1.22 packets
B  avg._thruput 
rtt
ploss
Alternate fluid model
 Rate of change of sending rate = term
inversely proportional to current rate with
probability (1-p) - term proportional to
current rate with probability p
 In steady state,
TCP throughput: A better loss rate
based “simple” model [PFTK]
 With many flows, loss rate and delay are
not affected much by a single TCP flow
 TCP
behavior completely specified by loss
and delay pattern along path (bounded by
bottleneck capacity)
 Given loss rate p and delay T what is TCP’s
throughput B packets/sec taking timeouts
into account?
What is PFTK modeling?
 Independent loss probability p across
rounds
 Loss
´ triple duplicate acks
 Bursty loss in a round: if some packet lost,
all following packets in that round also lost
 Timeout if < three duplicate acks received
PFTK empirical validation: Low loss
PFTK empirical validation: High loss
Loss-based TCP
 Evolution of loss-based TCP
 Tahoe
(without fast retransmit)
 Reno (triple duplicate acks + fast
retransmit)
 NewReno (Reno + handling multiple losses
better)
 SACK (selective acknowledgment) common
today
 Q: what if loss not due to congestion?
Delay-based TCP Vegas
 Uses delay as a signal of congestion
 Idea:
try to keep a small constant number of
packets at bottleneck queue
 Expected = W/BaseRTT
 Actual = W/CurRTT
 Diff = Expected - Actual
 Try to keep Diff between fixed 1 and 3
 More recent FAST TCP based on Vegas.
 Delay-based
TCP not widely used today
TCP-Friendliness
 Can we try MyFavNew TCP?

Well, is it TCP-friendly?
 Any alternative congestion control scheme needs
to coexist with TCP in FIFO queues in the besteffort Internet, or be isolated from TCP.
 To co-exist with TCP, it must impose the same
long-term load on the network:
No greater long-term throughput as a function of
packet loss and delay so TCP doesn't suffer
 Not significantly less long-term throughput or it's
not too useful

TCP friendly rate control (TFRC)
Use a model of TCP's throughout as a
function of the loss rate and RTT directly in
a congestion control algorithm.
 If
transmission rate is higher than that
given by the model, reduce the transmission
rate to the model's rate.
 Otherwise increase the transmission rate.
 Eg, DCCP (Datagram Congestion Control
Protocol), for unreliable congestion control
Q: how to measure/use loss rate and RTT?
High speed TCP
TCP in high speed networks
 Example: 1500 byte segments, 100ms RTT, want 10 Gbps
throughput
 Requires window size W = 83,333 in-flight segments
 Throughput in terms of loss rate:
 ➜ p = 2·10-10 or equivalently at most one drop every
couple hours!
 New versions of TCP for high-speed networks needed!
TCP’s long recovery delay
 More than an hour to recover from a loss
or timeout
~41,000 packets
~60,000 RTTs
~100 minutes
High-speed TCP
 Proposals
 Scalable
TCP, HSTCP, FAST, CUBIC
 General idea is to use superlinear window
increase
 Particularly useful in high bandwidth-delay
product regimes
Alternate choices of response functions
Scalable TCP - S = 0.15/p
Q: Whatever happened to TCP-friendly?
High speed TCP [Floyd]
 additive increase,
multiplicative decrease
 increments, decrements
depend on window size
Scalable TCP (STCP) [T. Kelly]
 multiplicative increase, multiplicative
decrease
WW+a
WW–bW
per ACK
per window with loss
STCP dynamics
From 1st PFLDnet Workshop, Tom Kelly
Active Queue Management
Router Queue Management
 normally, packets dropped only when queue overflows
 “drop-tail” queueing
P6 P5 P4 P3 P2 P1
ISP
router
Internet
ISP
router
FCFS
Scheduler
The case against drop-tail queue management
P6 P5 P4 P3 P2 P1
FCFS
Scheduler
 Large queues in routers are “a bad thing”

Delay: end-to-end latency dominated by length
of queues at switches in network
 Allowing queues to overflow is “a bad thing”
Fairness: connections transmitting at high
rates can starve connections transmitting at
low rates
 Utilization: connections can synchronize their
response to congestion

Idea: early random packet drop
P6 P5 P4 P3 P2 P1
FCFS
Scheduler
When queue length exceeds threshold, drop
packets with queue length dependent
probability

probabilistic packet drop: flows see same loss

problem: bursty traffic (burst arrives when
queue is near threshold) can be over penalized
rate
Random early detection (RED) packet drop
Average queue length
Max
queue length
Drop probability
Forced drop
Max
threshold
Probabilistic
early drop
Min
threshold
No drop
Time
 Use exponential average of queue length to
determine when to drop
 avoid overly penalizing short-term bursts
 react to longer term trends
 Tie drop prob. to weighted avg. queue length
 avoids over-reaction to mild overload conditions
Random early detection (RED) packet drop
Average queue length
Max
queue length
Drop probability
Forced drop
Max
threshold
Probabilistic
early drop
Min
threshold
No drop
Time
Drop probability
100%
maxp
min
max
Weighted Average
Queue Length
RED summary: why random drop?
 Provide gentle transition from no-drop to
all-drop
 Provide
“gentle” early warning
 Avoid synchronized loss bursts among
sources
 Provide same loss rate to all sessions:
 With
tail-drop, low-sending-rate sessions
can be completely starved
Random early detection (RED) today
 Many (5) parameters: nontrivial to
tune (at least for HTTP traffic)
 Gains over drop-tail FCFS not that
significant
 Still not widely deployed …
Why randomization important?
 Synchronization of periodic routing updates
 Periodic losses observed in end-end Internet
traffic
source: Floyd,
Jacobson 1994
Router update operation:
time spent in state
depends on msgs
received from others
(weak coupling
between routers
processing)
timeout,
or link fail
update
prepare
own routing
update
receive update from neighbor
process (time: TC2)
(time: TC)
<ready>
send update (time: Td to arrive at dest)
start_timer (uniform: Tp +/- Tr)
wait
receive update from neighbor
process
Router synchronization
 20 (simulated)
routers broadcasting
updates to each other
 x-axis: time until
routing update sent
relative to start of
round
 By t=100,000 all
router rounds are of
length 120!
 synchronization or
lack thereof depends
on system parameters
Avoiding synchronization
 Choose random
timer component,
Tr large (e.g.,
several multiples
of TC)
Add enough
randomization
to avoid
synchronization
receive update from neighbor
process (time: TC2)
prepare
own routing
update
(time: TC)
<ready>
send update (time: Td to arrive)
start_timer (uniform: Tp +/- Tr)
wait
receive update from neighbor
process
Randomization
 Takeaway message:
 randomization
robust
makes a system simple and
Background transport: TCP Nice
What are background transfers?
 Data that humans are not waiting for
 Non-deadline-critical
 Unlimited demand
 Examples
Prefetched traffic on the Web
 File system backup
 Large-scale data distribution services
 Background software updates
 Media file sharing

Desired Properties
 Utilization of spare network capacity
 No interference with regular transfers

Self-interference
• applications hurt their own performance

Cross-interference
• applications hurt other applications’ performance
TCP Nice
 Goal: abstraction of free infinite bandwidth
 Applications say what they want

OS manages resources and scheduling
 Self tuning transport layer
Reduces risk of interference with foreground
traffic
 Significant utilization of spare capacity by
background traffic
 Simplifies application design

Why change TCP?
 TCP does network resource management

Need flow prioritization
 Alternative: router prioritization
+ More responsive, simple one bit priority
 Hard to deploy
 Question:

Can end-to-end congestion control achieve noninterference and utilization?
TCP Nice
 Proactively detects congestion
 Uses increasing RTT as congestion signal

Congestion  incr. queue lengths  incr. RTT
 Aggressive responsiveness to congestion
 Only modifies sender-side congestion control
Receiver and network unchanged
 TCP friendly

TCP Nice
 Basic algorithm
1. Early Detection  thresh. queue length incr. in RTT
 2. Multiplicative decrease on early congestion
 3. Allow cwnd < 1.0 (despite no loss)

 per-ack operation:
 if(curRTT > minRTT + threshold*(maxRTT – minRTT))
numCong++;
 per-round operation:
 if(numCong > f.W)
W  W/2
else { … AIMD congestion control }
Nice: the works
Reno Add *
Add *
Nice Add *
Add *
Mul +
Mul +
Mul +
t.B
m
pkts
B
minRTT = t
maxRTT = tB/m
 Non-interference getting out of the way in time
 Utilization maintaining a small queue
Foreground Document Latency (sec)
Network Conditions
1e3
100
10
V0
Reno
Nice
1
Vegas
Router Prio
0.1
1
10
Spare Capacity
100
 Nice causes low interference to foreground Web traffic
even when there isn’t much spare capacity.
Scalability
1e3
Document Latency (sec)
Vegas
100
Reno
10
V0
Nice
1
Router Prio
0.1
1
10
Num BG flows
 W < 1 allows Nice to scale to any number of
background flows
100
Utilization
8e4
Vegas
BG Throughput (KB)
Reno
6e4
Router Prio
4e4
V0
Nice
2e4
0
1
10
Num BG flows
100
 Nice utilizes 50-80% of spare capacity w/o stealing
any bandwidth from FG
Wide-area network experiments
What is TCP optimizing?
How does TCP allocate network
resources?
 Problem: Given a network and some number of
long-lived TCP connections between different
source-destination routes, can we model the
resulting resource allocation?
 How to model the interaction between TCP and
the network?

Recall: PFTK like models assumed network
conditions are not affected by (a single) TCP flow
Optimization-based approach towards congestion
control
Resource allocation as optimization problem:
 How to allocate resources (e.g., bandwidth) to
optimize some objective function
 Maybe not possible to obtain exact optimality but..
 optimization framework as means to explicitly
steer network towards desirable operating point
 practical congestion control as distributed
asynchronous implementations of optimization
algorithm
 systematic approach towards protocol design
Model
 Network: Links l each of capacity cl
 Sources s: (L(s), Us(xs))
L(s) - links used by source s
Us(xs) - utility if source rate = xs
example utility
function for elastic
application
Us(xs)
xs
x1
x1  x2  c1
x1  x3  c2
c1
x2
c2
x3
Q: What are possible allocations with say unit capacity links?
Optimization Problem
max
xs 0
U
s
subject t o
s
( xs )
x
sS ( l )
s
 cl , l  L
“system” problem
 maximize system utility (note: all sources “equal”)
 constraint: bandwidth used less than capacity
 centralized solution to optimization impractical
 must know all utility functions
 impractical for large number of sources
 can we view congestion control as distributed
asynchronous algorithms to solve this problem?
The user view
 User can choose amount to pay per unit time, ws
 Would like allocated bandwidth, xs in proportion to ws
ws
xs 
ps
 ps could be viewed as charge per unit flow for user s
w s 
Us  w s
max
 ps 
subject to ws  0
user’s utility

user problem
cost
The network view
 Suppose network knows vector {ws}, chosen by users
 Network wants to maximize logarithmic utility function
max
xs 0
w
s
s
subject t o
log xs
x
sS(l)
network problem
s
 cl
Solution existence
 There exist prices, ps,
source rates, xs, and
amount-to-pay-per-unittime, ws = psxs such that
 {Ws} solves user
problem
 {Xs} solves the
network problem
 {Xs} is the unique
solution to the system
problem
 ws 
  ws
Us 
max
 ps 
subject t o ws  0
w
max
s
xs 0
s
subject to
log xs
x
sS(l)
max
xs 0
U
s
subject t o
s
s
 cl
( xs )
x
sS ( l )
s
 cl , l  L
Proportional Fairness
 Vector of rates, {xs}, proportionally fair if
feasible and for any other feasible vector {xs*}:
x  xs
0

xs
sS
*
s
 Result: if wr=1, then {Xs} solves the
network problem IFF it is proportionally
fair
 Similar result exists for the case that wr
not equal 1.
Max-min Fairness
Rates {xr} max-min fair if for any other
feasible rates {yr}, if ys > xs, then  p,
such that xp xs and yp < xp
Minimum potential delay fairness
 Rates {xr} are minimum potential delay fair
if Ur (xr) = -wr/xr
Interpretation: if wr is file size, then wr/xr
is transfer time; optimization problem is
to minimize sum of transfer delays
Max-min Fairness
rates {xr} max-min fair if for any other
feasible rates {yr}, if ys > xs, then  p, such
that xp xs and yp < xp
What is corresponding utility function?
1
r
x
U r ( xr )  lim 
1
Solving the network problem
 Results so far: existence - solution exists
with given properties
 How to compute solution?
 Ideally:
distributed solution easily embodied
in protocol
 Should reveal insight into existing protocol
Solving the network problem


d
xs (t )  k  ws  xs (t )  pl (t ) 
dt
lL ( s )


multiplicative
change in linear
bandwidth increase decrease
allocation at s
where



pl (t )  g l   xs (t ) 
 lL ( s )

congestion “signal”: function of aggregate rate
at link l, fed back to s.
Solving the network problem


d
xs (t )  k  ws  xs (t )  pl (t ) 
dt
lL ( s )


 Results:
 * converges to solution of relaxation of network
problem

xs(t)Spl(t) converges to ws
 Interpretation: TCP-like algorithm to iteratively solves
optimal rate allocation!
Source Algorithm
 Source needs only its path price:
xÝr  kr (x r )(U r '(x r )  qr )
 kr() nonnegative nondecreasing function
 Above algorithm converges to unique
solution for any initial condition
 qr interpreted as loss/marking probability
Proportionally-Fair Controller
If utility function is
then a controller that implements it is
given by
Pricing interpretation
 Can network choose pricing scheme to
achieve fair resource allocation?
 Suppose network charges price qr ($/bit)
where qr= pl
 User’s strategy: spend wr ($/sec.) to
maximize
Optimal User Strategy
 equivalently,
Simplified TCP-Reno
 suppose
x
 then,
2(1  p)
T p
2

T p
1
U ( x)  
Tx
 interpretation: minimize (weighted) delay
Is AIMD special?
 Consider a window control as follows
 cwnd
+= a*cwnd^n, when no loss
 cwnd -= b*cwnd^m when loss
 where n<m
 Expected change in congestion window
 Expected change in rate per unit time
MIMD (n,m)
 Consider the controller
 where
 Then, at equilibrium
 Where α = m-n. For stability
Motivation
Congestion Control:
maximize user
utility
Given routing Rli
how to adapt end
rate xi?
Traffic Engineering:
minimize network
congestion
Given traffic xi
how to perform
routing Rli?
Congestion Control Model
Users are indexed by i
aggregate utility
Utility
Ui(xi)
max. ∑ i Ui(xi)
s.t. ∑i Rlixi ≤ cl
var. x
capacity
constraints
Congestion control provides fair
rate allocation amongst users
Source rate xi
Traffic Engineering Model
Links are indexed by l
aggregate cost
Cost
f(ul)
ul = 1
Link Utilization ul
min. ∑l f(ul)
s.t.
ul =∑i Rlixi/cl
var. R
Traffic engineering avoids
bottlenecks in the network
Model of Internet Reality
Congestion Control:
max ∑i Ui(xi),
s.t. ∑i Rlixi ≤ cl
xi
Rli
Traffic Engineering:
min ∑l f(ul),
s.t. ul =∑i Rlixi/cl
System Properties
 Convergence
 Does it achieve some objective?
 Benchmark:
max. ∑i Ui(xi)
s.t. Rx ≤ c
Var. x, R
 Utility gap between the joint system
and benchmark
Multipath TCP
Joint routing and congestion control
 Multipath TCP controller

Document

Transcript Document

Directory