FAST v3 - California Institute of Technology

Download Report

Transcript FAST v3 - California Institute of Technology

FAST TCP in Linux
Cheng Jin
David Wei
http://netlab.caltech.edu/FAST/WANinLab/nsfvisit
Outline




Overview of FAST TCP.
Implementation Details.
SC2002 Experiment Results.
FAST Evaluation and WAN-in-Lab.
netlab.caltech.edu
FAST vs. Linux TCP
Flows
Linux TCP
Bmps
Peta
Throughput
(Mbps)
Transfer
(GB)
1
1.86
185
78
1
2.67
266
111
FAST
19.11.2002
1
9.28
925
387
Linux TCP
2
3.18
317
133
2
9.35
931
390
2
18.03
1,797
753
txqlen=100
Linux TCP
txqlen=10000
txqlen=100
Linux TCP
txqlen=10000
FAST
19.11.2002
Distance = 10,037 km; Delay = 180 ms; MTU = 1500 B; Duration: 3600 s
Linux TCP Experiments: Jan 28-29, 2003
netlab.caltech.edu
Aggregate Throughput
92%
FAST
 Standard MTU
 Utilization averaged over 1hr
2G
48%
Average
utilization
95%
1G
27%
16%
19%
txq=100
txq=10000
Linux TCP
Linux TCP
netlab.caltech.edu
FAST
Linux TCP
Linux TCP
FAST
Summary of Changes




RTT estimation: fine-grain timer.
Fast convergence to equilibrium.
Delay monitoring in equilibrium.
Pacing: reducing burstiness.
netlab.caltech.edu
FAST TCP Flow Chart
Fast
Convergence
Slow Start
Normal
Recovery
Equilibrium
Time-out
Loss
Recovery
netlab.caltech.edu
RTT Estimation
 Measure queueing delay.
 Kernel timestamp with s resolution.
 Use SACK to increase the number of RTT
samples during recovery.
 Exponential averaging of RTT samples to
increase robustness.
netlab.caltech.edu
Fast Convergence
 Rapidly increase or decrease cwnd
toward equilibrium.
 Monitor the per-ack queueing delay to
avoid overshoot.
netlab.caltech.edu
Equilibrium
 Vegas-like cwnd adjustment in large
time-scale -- per RTT.
 Small step-size to maintain stability in
equilibrium.
 Per-ack delay monitoring to enable
timely detection of changes in
equilibrium.
netlab.caltech.edu
Pacing
 What do we pace?
 Increment to cwnd.
 Time-Driven vs. event-driven.
 Trade-off between complexity and
performance.
 Timer resolution is important.
netlab.caltech.edu
Time-Based Pacing
data
ack
data
cwnd increments are scheduled at fixed intervals.
netlab.caltech.edu
Event-Based Pacing
Detect sufficiently large gap between consecutive bursts
and delay cwnd increment until the end of each such burst.
netlab.caltech.edu
SCinet
Caltech-SLAC experiments
SC2002
Baltimore, Nov 2002
Highlights
FAST TCP

Standard MTU

Peak window = 14,255 pkts

Throughput averaged over > 1hr

925 Mbps single flow/GE card
10
9
#flows
9.28 petabit-meter/sec
1.89 times LSR
2

34.0 petabit-meter/sec
6.32 times LSR
I2 LSR
1
1
8.6 Gbps with 10 flows

2
21TB in 6 hours with 10 flows
Implementation

Sender-side modification

Delay based
Internet: distributed feedback
system
Rf (s)
Theory
Experiment
Geneva
7000km
FAST
7
x
AQM
TCP
Rb’(s)
netlab.caltech.edu/FAST
p
Sunnyvale
3000km
Baltimore
Chicago 1000km
C. Jin, D. Wei, S. Low
FAST Team and Partners
Network
(Sylvain Ravot, caltech/CERN)
netlab.caltech.edu
FAST BMPS
10
9
7
FAST
2
1
Internet2
Land Speed
Record
netlab.caltech.edu
1
2
FAST
 Standard MTU
 Throughput averaged over > 1hr
#flows
Aggregate Throughput
88%
FAST
 Standard MTU
 Utilization averaged over > 1hr
90%
90%
Average
utilization
92%
95%
1hr
1 flow
netlab.caltech.edu
1hr
2 flows
6hr
7 flows
1.1hr
6hr
9 flows
10 flows
Caltech-SLAC Entry
Power glitch
Reboot
Rapid recovery
after possible
hardware glitch
100-200Mbps ACK traffic
netlab.caltech.edu
SCinet
Caltech-SLAC experiments
Acknowledgments
SC2002
Baltimore, Nov 2002
netlab.caltech.edu/FAST

Prototype
C. Jin, D. Wei

Theory
D. Choe (Postech/Caltech), J. Doyle, S. Low, F. Paganini (UCLA), J. Wang, Z. Wang
(UCLA)

Experiment/facilities

Caltech: J. Bunn, C. Chapman, C. Hu (Williams/Caltech), H. Newman, J. Pool, S.
Ravot (Caltech/CERN), S. Singh

CERN: O. Martin, P. Moroni

Cisco: B. Aiken, V. Doraiswami, R. Sepulveda, M. Turzanski, D. Walsten, S. Yip

DataTAG: E. Martelli, J. P. Martin-Flatin

Internet2: G. Almes, S. Corbato

Level(3): P. Fernes, R. Struble

SCinet: G. Goddard, J. Patton

SLAC: G. Buhrmaster, R. Les Cottrell, C. Logg, I. Mei, W. Matthews, R. Mount, J.
Navratil, J. Williams

StarLight: T. deFanti, L. Winkler

TeraGrid: L. Winkler

Major sponsors
ARO, CACR, Cisco, DataTAG, DoE, Lee Center, NSF
Evaluating FAST
 End-to-End monitoring doesn’t tell the
whole story.
 Existing network emulation (dummynet)
is not always enough.
 Better optimization if we can look inside
and understand the real network.
netlab.caltech.edu
Dummynet and Real Testbed
netlab.caltech.edu
Dummynet Issues
 Not running on a real-time OS -imprecise timing.
 Lack of priority scheduling of dummynet
events.
 Bandwidth fluctuates significantly with
workload.
 Much work needed to customize
dummynet for protocol testing.
netlab.caltech.edu
10 GbE Experiment
 Long-distance testing of Intel 10GbE
cards.
 Sylvain Ravot (Caltech) achieved 2.3
Gbps using single stream with jumbo
frame and stock Linux TCP.
 Tested HSTCP, Scalable TCP, FAST, and
stock TCP under Linux.
1500B MTU: 1.3 Gbps SNV -> CHI;
netlab.caltech.edu
9000B MTU: 2.3 Gbps SNV -> GVA
TCP Loss Mystery
 Frequent packet loss with 1500-byte
MTU. None with larger MTUs.
 Packet loss even when cwnd is capped
at 300 - 500 packets.
 Routers have large queue size of 4000
packets.
 Packets captured at both sender and
receiver using tcpdump.
netlab.caltech.edu
How Did the Loss Happen?
loss detected
netlab.caltech.edu
How Can WAN-in-Lab Help?
 We will know exactly where packets are
lost.
 We will also know the sequence of
events (packet arrivals) that lead to
loss.
 We can either fix the problem in the
network if any, or improve the protocol.
netlab.caltech.edu
Conclusion
 FAST improves the end-to-end
performance of TCP.
 Many issues are still to be understood
and resolved.
 WAN-in-Lab can help make FAST a
better protocol.
netlab.caltech.edu