FAST v3 - California Institute of Technology
Download
Report
Transcript FAST v3 - California Institute of Technology
FAST TCP in Linux
Cheng Jin
David Wei
http://netlab.caltech.edu/FAST/WANinLab/nsfvisit
Outline
Overview of FAST TCP.
Implementation Details.
SC2002 Experiment Results.
FAST Evaluation and WAN-in-Lab.
netlab.caltech.edu
FAST vs. Linux TCP
Flows
Linux TCP
Bmps
Peta
Throughput
(Mbps)
Transfer
(GB)
1
1.86
185
78
1
2.67
266
111
FAST
19.11.2002
1
9.28
925
387
Linux TCP
2
3.18
317
133
2
9.35
931
390
2
18.03
1,797
753
txqlen=100
Linux TCP
txqlen=10000
txqlen=100
Linux TCP
txqlen=10000
FAST
19.11.2002
Distance = 10,037 km; Delay = 180 ms; MTU = 1500 B; Duration: 3600 s
Linux TCP Experiments: Jan 28-29, 2003
netlab.caltech.edu
Aggregate Throughput
92%
FAST
Standard MTU
Utilization averaged over 1hr
2G
48%
Average
utilization
95%
1G
27%
16%
19%
txq=100
txq=10000
Linux TCP
Linux TCP
netlab.caltech.edu
FAST
Linux TCP
Linux TCP
FAST
Summary of Changes
RTT estimation: fine-grain timer.
Fast convergence to equilibrium.
Delay monitoring in equilibrium.
Pacing: reducing burstiness.
netlab.caltech.edu
FAST TCP Flow Chart
Fast
Convergence
Slow Start
Normal
Recovery
Equilibrium
Time-out
Loss
Recovery
netlab.caltech.edu
RTT Estimation
Measure queueing delay.
Kernel timestamp with s resolution.
Use SACK to increase the number of RTT
samples during recovery.
Exponential averaging of RTT samples to
increase robustness.
netlab.caltech.edu
Fast Convergence
Rapidly increase or decrease cwnd
toward equilibrium.
Monitor the per-ack queueing delay to
avoid overshoot.
netlab.caltech.edu
Equilibrium
Vegas-like cwnd adjustment in large
time-scale -- per RTT.
Small step-size to maintain stability in
equilibrium.
Per-ack delay monitoring to enable
timely detection of changes in
equilibrium.
netlab.caltech.edu
Pacing
What do we pace?
Increment to cwnd.
Time-Driven vs. event-driven.
Trade-off between complexity and
performance.
Timer resolution is important.
netlab.caltech.edu
Time-Based Pacing
data
ack
data
cwnd increments are scheduled at fixed intervals.
netlab.caltech.edu
Event-Based Pacing
Detect sufficiently large gap between consecutive bursts
and delay cwnd increment until the end of each such burst.
netlab.caltech.edu
SCinet
Caltech-SLAC experiments
SC2002
Baltimore, Nov 2002
Highlights
FAST TCP
Standard MTU
Peak window = 14,255 pkts
Throughput averaged over > 1hr
925 Mbps single flow/GE card
10
9
#flows
9.28 petabit-meter/sec
1.89 times LSR
2
34.0 petabit-meter/sec
6.32 times LSR
I2 LSR
1
1
8.6 Gbps with 10 flows
2
21TB in 6 hours with 10 flows
Implementation
Sender-side modification
Delay based
Internet: distributed feedback
system
Rf (s)
Theory
Experiment
Geneva
7000km
FAST
7
x
AQM
TCP
Rb’(s)
netlab.caltech.edu/FAST
p
Sunnyvale
3000km
Baltimore
Chicago 1000km
C. Jin, D. Wei, S. Low
FAST Team and Partners
Network
(Sylvain Ravot, caltech/CERN)
netlab.caltech.edu
FAST BMPS
10
9
7
FAST
2
1
Internet2
Land Speed
Record
netlab.caltech.edu
1
2
FAST
Standard MTU
Throughput averaged over > 1hr
#flows
Aggregate Throughput
88%
FAST
Standard MTU
Utilization averaged over > 1hr
90%
90%
Average
utilization
92%
95%
1hr
1 flow
netlab.caltech.edu
1hr
2 flows
6hr
7 flows
1.1hr
6hr
9 flows
10 flows
Caltech-SLAC Entry
Power glitch
Reboot
Rapid recovery
after possible
hardware glitch
100-200Mbps ACK traffic
netlab.caltech.edu
SCinet
Caltech-SLAC experiments
Acknowledgments
SC2002
Baltimore, Nov 2002
netlab.caltech.edu/FAST
Prototype
C. Jin, D. Wei
Theory
D. Choe (Postech/Caltech), J. Doyle, S. Low, F. Paganini (UCLA), J. Wang, Z. Wang
(UCLA)
Experiment/facilities
Caltech: J. Bunn, C. Chapman, C. Hu (Williams/Caltech), H. Newman, J. Pool, S.
Ravot (Caltech/CERN), S. Singh
CERN: O. Martin, P. Moroni
Cisco: B. Aiken, V. Doraiswami, R. Sepulveda, M. Turzanski, D. Walsten, S. Yip
DataTAG: E. Martelli, J. P. Martin-Flatin
Internet2: G. Almes, S. Corbato
Level(3): P. Fernes, R. Struble
SCinet: G. Goddard, J. Patton
SLAC: G. Buhrmaster, R. Les Cottrell, C. Logg, I. Mei, W. Matthews, R. Mount, J.
Navratil, J. Williams
StarLight: T. deFanti, L. Winkler
TeraGrid: L. Winkler
Major sponsors
ARO, CACR, Cisco, DataTAG, DoE, Lee Center, NSF
Evaluating FAST
End-to-End monitoring doesn’t tell the
whole story.
Existing network emulation (dummynet)
is not always enough.
Better optimization if we can look inside
and understand the real network.
netlab.caltech.edu
Dummynet and Real Testbed
netlab.caltech.edu
Dummynet Issues
Not running on a real-time OS -imprecise timing.
Lack of priority scheduling of dummynet
events.
Bandwidth fluctuates significantly with
workload.
Much work needed to customize
dummynet for protocol testing.
netlab.caltech.edu
10 GbE Experiment
Long-distance testing of Intel 10GbE
cards.
Sylvain Ravot (Caltech) achieved 2.3
Gbps using single stream with jumbo
frame and stock Linux TCP.
Tested HSTCP, Scalable TCP, FAST, and
stock TCP under Linux.
1500B MTU: 1.3 Gbps SNV -> CHI;
netlab.caltech.edu
9000B MTU: 2.3 Gbps SNV -> GVA
TCP Loss Mystery
Frequent packet loss with 1500-byte
MTU. None with larger MTUs.
Packet loss even when cwnd is capped
at 300 - 500 packets.
Routers have large queue size of 4000
packets.
Packets captured at both sender and
receiver using tcpdump.
netlab.caltech.edu
How Did the Loss Happen?
loss detected
netlab.caltech.edu
How Can WAN-in-Lab Help?
We will know exactly where packets are
lost.
We will also know the sequence of
events (packet arrivals) that lead to
loss.
We can either fix the problem in the
network if any, or improve the protocol.
netlab.caltech.edu
Conclusion
FAST improves the end-to-end
performance of TCP.
Many issues are still to be understood
and resolved.
WAN-in-Lab can help make FAST a
better protocol.
netlab.caltech.edu