Transcript Document

Experiences in Design and Implementation
of a High Performance Transport Protocol
Yunhong Gu, Xinwei Hong, and Robert L. Grossman
National Center for Data Mining
Outline
•
•
•
•
•
TCP’s inefficiency in grid applications
UDT
Design issues
Implementations issues
Conclusion and future work
TCP and AIMD
• TCP has been very successful in the Internet
– AIMD (Additive Increase Multiplicative
Decrease)
• Fair: max-min fairness
• Stable: globally asynchronously stable
• But, inefficient and not scalable
– In grid networks (with high bandwidth-delay
product)
• RTT bias
Efficiency of TCP
1 Gb/s link, 200ms RTT, between Tokyo and Chicago
28 minutes
On 10 Gb/s link, 200ms RTT, it will take 4 hours 43
minutes to recover from a single loss.
TCP’s throughput model:
S
RTT
3
2p
It needs extremely low loss rate on high bandwidth-delay
product networks.
Fairness of TCP
Amsterdam
100ms
1 Gb/s
Chicago 1
1ms
Chicago 2
1Gb/s
Merge two real-time data
streams
From Chicago 1 to Chicago 2:
800Mbps
From Amsterdam to Chicago 2:
80Mbps
The throughput is limited by the
slowest stream!
UDT – UDP-based Data Transfer Protocol
• Application level transport protocol built
above UDP
• Reliable data delivery
• End-to-end approach
• Bi-directional
• General transport API; not a (file transfer)
tool.
• Open source
UDT Architecture
 Pkt. Scheduling Timer
Sender
Sender
DATA
ACK
Recver
 Retransmission Timer
 Rate Control Timer
ACK2
 ACK Timer
NAK
 NAK Timer
Recver
UDT – Objectives
• Goals
–
–
–
–
Easy to install and use
Efficient for bulk data transfer
Fair
Friendly to TCP
• Non-goals
– TCP replacement
– Messaging service
Design Issues
• Reliability/Acknowledging
• Congestion/Flow Control
• Performance evaluation
– Efficiency
– Fairness and friendliness
– Stability
Reliability/Acknowledging
• Acknowledging is expensive
– Packet processing at end hosts and routers
– Buffer processing
• Timer-based selective acknowledgement
– Send acknowledgement per constant time (if
there are packets to be acknowledged)
• Explicit negative acknowledgement
Congestion Control
• AIMD with decreasing increases
• Increase formula
 ( x)  10
log( L C ( x )) 
1500 1


S SYN
• Decrease
– 1/9
• Control interval is constant
– SYN = 0.01 second
UDT Algorithm
L = 10 Gbps, S = 1500 bytes
C (Mbps)
[0, 9000)
L - C (Mbps)
Increment (pkts/SYN)
(1000, 10000]
10
[9000, 9900)
(100, 1000]
1
[9900, 9990)
(10, 100]
0.1
[9990, 9999)
[9999, 9999.9)
(1, 10]
(0.1, 1]
0.01
0.001
9999.9+
<0.1
0.00067
UDT: Efficiency and Fairness Characteristics
• Takes 7.5 seconds to reach 90% of the link
capacity, independent of BDP
• Satisfies max-min fairness if all the flows
have the same end-to-end link capacity
– Otherwise, any flow will obtain at least half of
its fair share
• Does not take more bandwidth than
concurrent TCP flow as long as
RTT 2  L  SYN 2  108 / 6
Efficiency
• UDT bandwidth utilization
– 960Mb/s on 1Gb/s
– 580Mb/s on OC-12 (622Mb/s)
Throughput (Mbps)
1000
800
600
400
to Chicago, 1Gbps, 0.04ms
to Canarie, OC-12, 16ms
to Amsterdam, 1Gbps, 110ms
200
0
0
10
20
30
40
50
Time (s)
60
70
80
90
100
Fairness
• Fair bandwidth sharing between networks with
different RTTs and bottleneck capacities
– 330 Mb/s each for the 3 flows from Chicago to Chicago
Local via 1Gb/s, Amsterdam via 1Gb/s and Ottawa via
622Mb/s
Throughput (Mbps)
600
400
200
0
0
10
20
30
40
50
Time (s)
60
70
80
90
100
Fairness
• Fairness index
– Simulation: Jain’s Fairness Index for 10 UDT
and TCP flows over 100Mb/s link with
different RTTs
Fairness Index
1
0.95
0.9
0.85
UDT
TCP
0.8
-2
10
10
-1
10
0
10
RTT (ms)
1
10
2
10
3
RTT Fairness
• Fairness index of TCP flows with different
RTTs
– 2 flows, one has 1ms RTT, the other varies
from 1ms to 1000ms
RTT Fairness
1
0.98
0.96
0.94
0.92
0.9
0
10
10
1
10
RTT (ms)
2
10
3
Fairness and Friendliness
50 TCP flows and 4
UDT flows between
SARA and StarLight
Realtime snapshot of
the throughput
The 4 UDT flows
have similar
performance and
leave enough space
for TCP flows
TCP Friendliness
• Impact on short life TCP flows
– 500 1MB TCP flows with 1-10 bulk UDT flows,
over 1Gb/s link between Chicago and
Amsterdam
TCP Throughput (Mbps)
80
70
60
50
40
30
20
0
1
2
3
4
5
6
Number of UDT flows
7
8
9
10
Stability
• Stability index of UDT and TCP
– Stability: average standard deviation of
throughout per unit time
– 10 UDT flows and 10 TCP flows with different
RTTs
Stability Index
0.8
0.6
UDT
TCP
0.4
0.2
0
-2
10
10
-1
10
0
10
RTT (ms)
1
10
2
10
3
Implementations Issues
•
•
•
•
•
Efficiency and CPU utilization
Loss information processing
Memory management
API
Conformance
Efficiency and CPU utilization
• Efficiency = Mbps/MHz
• Maximize throughput
– Use CPU time as little as possible, so that CPU
won’t be used up before network bottleneck is
reached
– Remove CPU burst, which can cause packet
loss: even distribution of processing
• Minimize CPU utilization
Loss Processing
• On high BDP networks, the number of lost packets
can be very large during a loss event
• Access to the loss information may take long time
• Acknowledge may take several packets
Number of Loss Packets
3000
2000
1000
0
0
10
20
30
40
50
Loss Events
60
70
80
90
100
Loss Processing
• UDT loss processing
– Most loss are continuous
– Record loss event other than lost packets
– Access time is almost constant
Access Time (us)
8
6
4
2
0
0
10
20
30
40
50
60
Loss Events
70
80
90
100
Memory Processing
•
•
•
•
Memory copy avoidance
Overlapped IO
Data scattering/gathering
Speculation of next packet
User Buffer
Data
Protocol Buffer
Protocol Buffer
New Data
API
• Socket-like API
• Support overlapped IO
• File transfer API
– sendfile/recvfile
• Thread safe
• Performance monitoring
API - Example
int client = socket(AF_INET, SOCK_STREAM, 0);
connect(client, (sockaddr*)&serv_addr, sizeof(serv_addr));
If (-1 == send(client, data, size, 0))
{
//error processing
}
UDTSOCKET client = UDT::socket(AF_INET, SOCK_STREAM, 0);
UDT::connect(client, (sockaddr*)&serv_addr, sizeof(serv_addr));
If (UDTERROR == UDT::send(client, data, size, 0))
{
//error processing
}
Implementation Efficiency
• CPU usage of UDT and TCP
– UDT takes about 10% more CPU than TCP
– More code optimizations are still on going
CPU Usage (%)
60
50
40
30
20
udt sending
udt receiving
tcp sending
tcp receiving
10
0
0
5
10
15
20
25
30
Sample Event
35
40
45
50
Conclusion
• TCP is not suitable for distributed data
intensive applications over grid networks
• We introduced a new application level
protocol named UDT, to overcome the
shortcomings of TCP
• We explained the design rationale and
implementations details in this paper
Future Work
• Bandwidth Estimation
• CPU utilization
– Self-clocking
– Code optimization
• Theoretical work
References
• More details can be found in our paper.
• UDT specification
– Draft-gg-udt-01.txt
• Congestion control
– Paper on Gridnets '04 workshop
• UDT open source project
– http://udt.sf.net
Thank you!
Questions and comments are welcome!
For more information, please visit
Booth 653 (UIC/NCDM) at Exhibition Floor
UDT Project: http://udt.sf.net
NCDM: http://www.ncdm.uic.edu