Cheng Jin - California Institute of Technology

Transcript Cheng Jin - California Institute of Technology

FAST TCP:
Motivation, Architecture,
Algorithms, Performance
Cheng Jin David Wei Steven Low
http://netlab.caltech.edu
Difficulties at large window

Equilibrium problem



Packet level: AI too slow, MD too
drastic.
Flow level: requires very small loss
probability.
Dynamic problem


Packet level: must oscillate on a binary
signal.
Flow level: unstable at large window.
Problem: binary signal
TCP
oscillation
Solution: multibit signal
FAST
stabilized
Queueing Delay in FAST



Queueing delay is not used to avoid loss
Queueing delay defines a target number
of packets () to be buffered for each
flow
Queueing delay allows FAST to estimate
the distance from the target
Problem: no target

Reno: AIMD (1, 0.5)
ACK:
W  W + 1/W
Loss: W  W – 0.5W

HSTCP: AIMD (a(w), b(w))
ACK:
W  W + a(w)/W
Loss: W  W – b(w)W

STCP: MIMD (1/100, 1/8)
ACK:
W  W + 0.01
Loss: W  W – 0.125W
Solution: estimate target

FAST
FAST
Conv
Slow
Start
Equil
Scalable to any w*
Loss
Rec
Implementation Strategy

Common flow level dynamics
wi (t)
window
adjustment

=
control
gain
flow level
goal
Small adjustment when close, large far away





qi (t ) 
 (t )  1  ' 
 Ui ( t ) 
Need to estimate how far current state is from tarqet
Scalable
Queueing delay easier to estimate compared with
extremely small loss probability
Window Control Algorithm

RTT: exponential moving average with
weight of min {1/8, 3/cwnd}
baseRTT: latency, or minimum RTT

 determines fairness and convergence

rate
FAST and Other DCAs




FAST is one implementation within the
more general primal-dual framework
Queueing delay is used as an explicit
feedback from the network
FAST does not use queueing delay to
predict or avoid packet losses
FAST may use other forms of price in the
future when they become available
Architecture
Each component


designed independently
upgraded asynchronously
Data
Control
Window
Control
Burstiness
Control
Estimation
TCP Protocol Processing
Experiments




In-house dummynet testbed
PlanetLab Internet experiments
Internet2 backbone experiments
ns-2 simulations
Dummynet Setup





Single bottleneck link, multiple path latencies
Iperf for memory-to-memory transfers
Intra-protocol testings
Dynamic network scenarios
Instrumentation on the sender and the router
What Have We Learnt?





FAST is reasonable under normal
network conditions
Well-known scenarios where FAST
doesn’t perform well
Network behavior is important
Dynamic scenarios are important
Host implementation (Linux) also
important
FAST
Linux
Dynamic sharing: 3 flows
Steady throughput
HSTCP
STCP
Aggregate Throughput
Ideal CDF
large windows
Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts
Stability
stable under
diverse scenarios
Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts
Fairness
Reno and HSTCP
have similar fairness
Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts
queue
Room for mice !
FAST
loss
Linux
throughput
HSTCP
HSTCP
30min
STCP
Known Issues

Network latency estimation



Small network buffer



at least like TCP Reno
adapt on slow timescale, but how?
TCP-friendliness



route changes, dynamic sharing
does not upset stability
friendly at least at small window
how to dynamically tune friendliness?
Reverse path congestion
Acknowledgments

Caltech


UCLA



Aiken, Doraiswami, Yip
Level(3)


Cottrell
Internet2

Almes, Shalunov
Cisco


Martin
SLAC


Paganini, Z. Wang
CERN


Bunn, Choe, Doyle, Newman, Ravot, Singh, J. Wang
Fernes
LANL

Wu
http://netlab.caltech.edu/FAST
FAST TCP: motivation, architecture,
algorithms, performance.

IEEE Infocom 2004
Code reorganization, ready for integration
with web100.


b-release: summer 2004
Inquiry: [email protected]
The End
FAST TCP v.s. Buffer Size
Sanjay Hegde & David Wei
Backward Queueing Delay II
Bartek Wydrowski
1 fw flow
1 bk flow
2 fw flows
1 bk flows
2 fw flows
2 bk flows
PlanetLab Internet Experiment
Jayaraman & Wydrowski
Throughput v.s. loss and delay
qualitatively similar results
FAST saw higher loss due
to large alpha value
Linux Related Issues

Complicated state transition


Netdev implementation and NAPI


Linux TCP kernel documentation
frequent delays between dev and TCP layers
Linux loss recovery




too many acks during fast recovery
high CPU overhead per SACK
very long recovery times
Scalable TCP and H-TCP offer enhancements
TCP/AQM
pl(t)
TCP:
 Reno
 Vegas
 FAST

Congestion control has two components



xi(t)
AQM:
 DropTail
 RED
 REM/PI
 AVQ
TCP: adjusts rate according to congestion
AQM: feeds back congestion based on utilization
Distributed feed-back system

equilibrium and stability properties determine
system performance
Optimization Model


Network bandwidth allocation as utility
maximization
Optimization problem
max
x s 0
subject to

U ( x )
s
s
s
y l  cl ,
Primal-dual components
x(t+1) = F(q(t), x(t))
p(t+1) = G(y(t), p(t))
l  L
Source
Link
Network Model
x
Rf(s)
F1
Network
TCP
FN
q



Rb(s)
y
G1
AQM
GL
p
Components: TCP and AQM algorithms, and
routing matrices
Each TCP source sees an aggregate price, q
Each link sees an aggregate incoming rate
Packet Level

Reno
AIMD(1, 0.5)

HSTCP
AIMD(a(w), b(w))

STCP
MIMD(a, b)

FAST
ACK: W  W + 1/W
Loss: W  W – 0.5 W
ACK: W
Loss: W


W + a(w)/W
W – b(w) W
ACK: W  W + 0.01
Loss: W  W – 0.125 W
RTT : W  W 
baseRTT

RTT

Cheng Jin - California Institute of Technology

Transcript Cheng Jin - California Institute of Technology

Directory