TCP Variations - University of Delaware

Transcript TCP Variations - University of Delaware

TCP Variations:
Tahoe, Reno, New Reno, Vegas, Sack
Paul D. Amer, Professor
Computer & Information Sciences
University of Delaware
What are TCP Variations?
• Implementations of TCP that use different
algorithms to achieve end-to-end
congestion control.
– Tahoe
– Reno
– NewReno
– Vegas
Evolution of TCP
1975
Three-way handshake
Ray Tomlinson
In SIGCOMM 75
1974
TCP described by
Vint Cerf, Bob Kahn
In IEEE Trans Comm
1975
1980
1984
Nagel’s algorithm
to reduce overhead
of small packets;
predicts congestion
collapse
1983
BSD Unix 4.2
supports TCP/IP
1981
TCP & IP
RFC 793 & 791
1987
Karn’s algorithm
to better estimate
round-trip time
1986
Congestion
collapse
st
1 observed
1985
1990
4.3BSD Reno
fast recovery
delayed ACK’s
1988
Van Jacobson’s
algorithms
slow start,
congestion
avoidance, fast
retransmit (all
implemented in
4.3BSD Tahoe)
SIGCOMM 88
1990
Evolution of TCP
1994
T/TCP
Transaction TCP
(Braden)
1994
1993
ECN
TCP Vegas(not
Explicit
implemented)
real congestion Congestion
Notification
avoidance
(Floyd)
(Brakmo et al)
1993
1994
1996
NewReno
modified fast
recovery
SACK TCP
Selective Ack
(Floyd et al)
1996
Improving TCP
startup
(Hoe)
1996
1996
FACK TCP
Forward Ack
extension to SACK
(Mathis et al)
How did TCP cause Congestion?
(original recipe TCP)
• Poor Efficiency
• In telnet-like applications, TCP sends 1 byte of data
with 4000% overhead.
• Sending too much, too soon
• Unnecessary retransmits
• Send window too big
• Very little change in behavior due to congestion
Self-clocking or ACK Clock
Pb
Pr
Sender
Receiver
As
Ab
Ar
• Self-clocking systems tend to be very stable under a wide
range of bandwidths and delays.
• The principal issue with self-clocking systems is getting
them started.
TCP Algorithms:
• Four intertwined algorithms used commonly in TCP
implementations
• Slow Start - Every ack increases the sender’s window
(cwnd) size by 1
• Congestion Avoidance - Reducing sender’s window size by
half at experience of loss, and increase the sender’s
window at the rate of about one packet per RTT (NOTE:
not per ack)
• Fast Retransmit - Don’t wait for retransmit timer to go off,
retransmit packet if 3 duplicate acks received
• Fast Recovery - Since duplicate ack came through, one
packet has left the wire. Perform congestion avoidance,
don’t jump down to slow start
Packet
Ack
Slow Start
t=0r
TCP
Sender
TCP
Receiver
cwnd=1
pkt 0
new cwnd
=
old cwnd
+
# acks recd
t=1r
ack 0
cwnd=2
pkts 1,2
t=2r
acks 1,2
cwnd=4
pkts3,4,5,6
t=3r
cwnd=8
Congestion Avoidance
After cwnd reaches
a certain threshold
t=xr
cwnd=4
new cwnd
t=(x+1)r
=
cwnd=5
old cwnd
+
# acks/cwnd
t=(x+2)r
cwnd=6
t=(x+3)r
cwnd=7
TCP
Sender
Packet
Ack
TCP
Receiver
S
Example: Slow Start/Congestion Avoidance
cwnd = 1
cwnd = 2
assume ssthresh = 8*MSS
cwnd = 4
congestion window size
(in MSS)
12
10
8
cnwd = 8
ssthresh
6
4
cwnd = 9
2
0
1
2
3
4
5
transmission number
6
7
cwnd = 10
cwnd = 11
R
Slow Start & Congestion Avoidance
• Initally:
- cwnd = 1*MSS
- ssthresh = very high
• If a new ACK comes:
- if cwnd < ssthresh  update
cwnd according to slow start
- if cwnd
> ssthresh  update
ssthresh
cwnd according to congestion
avoidance
- If cwnd = ssthresh  either
cwnd
(initial) ssthresh
Loss, e.g. timeout
slow start – in green
• If timeout (i.e. loss) :
- ssthresh = flight size/2;
- cwnd = 1*MSS
congestion avoidance – in blue
time
Packet
Ack
Fast Retransmit
At a random point
in the transfer
TCP
Sender
TCP
Receiver
t=xr
t=(x+1)r
3 dup acks
Fast
Retransmit
TCP Tahoe’s Fast Retransmit
R
S
1. Sender
receives 3
dupACKS.
2. Sender infers
that the
segment is
lost.
3. Sender resends the
segment
immediately!
4. Sender returns
to slow-start.
cwnd = 1
cwnd = 2
cwnd = 4
3 duplicate
ACKs
fast-retransmit
of segment 4
TCP Variants :
• TCP-Tahoe:
– implements the slow start, congestion avoidance, and
fast retransmit algorithms
• TCP-Reno:
– implements the slow start, congestion avoidance, fast
retransmit, and fast recovery algorithms
• Among other implementations are Vegas,
NewReno (the most commonly implemented on
webservers today, according to a survey) and
SACK TCP.
TCP Tahoe Trace
(with one dropped segment)
500000
48
44
Sent Segment
ACK'ed Segment
cwnd
ssthresh
480000
40
36
32
Lost segment
Fast Retransmit
28
440000
24
20
420000
Begin congestion
avoidance
Begin slow-start
16
12
400000
8
4
380000
4.9
5.0
5.1
5.2
5.3
Time (s)
5.4
5.5
5.6
0
5.7
MSS
Sequence #
460000
TCP Tahoe Trace
(with one dropped segment)
500000
480000
48
44
Sent Segment
ACK'ed Segment
cwnd
ssthresh
40
36
32
28
440000
24
20
420000
16
12
400000
8
4
380000
4.9
5.0
5.1
5.2
5.3
Time (s)
5.4
5.5
5.6
0
5.7
MSS
Sequence #
460000
Could Tahoe be Improved?
• Receipt of dupACKs tells the
sender that the receiver is
still getting new segments,
i.e. there is still data flowing
between sender and
receiver
• Why does sender go back to
slow start after fast
retransmit?
• Why does sender let Ack
clock die?
TCP Variation: TCP Reno
• 2nd Improvement was TCP Reno (1990)
– Nagle’s algorithm
– Improved RTO calculation and back-off
– AIMD congestion avoidance with slow-start
– Fast retransmit & fast recovery
Fast Recovery
Concept:
cwnd
(initial) ssthresh
fast-retransmit
• After fast retransmit,
fast-retransmit
reduce cwnd by half, and
continue sending
segments at this reduced
level.
new ACK
timeout
new ACK
Problems:
• Sender has too many
outstanding segments.
• How does sender transmit
packets on a dupACK?
Need to use a “trick” inflate cwnd.
Slow Start
Congestion Avoidance
“inflating” cwnd with dupACKs
“deflating” cwnd with a new ACK
Time
Fast Retransmit & Fast Recovery
•
After receiving 3 dupACKS:
1.
2.
3.
•
Retransmit the lost segment.
Set ssthresh = flight size/2.
Set cwnd = ssthresh, and ndupacks = 3.
N.B. In Reno: send_win = min ( rwnd, cwnd + ndupacks ).
If dupACK arrives:
–
–
•
++ ndupacks
Transmit new segment, if allowed.
If new ACK arrives:
–
–
•
ndupacks = 0
Exit fast recovery.
If RTO expires:
–
–
ndupacks = 0
Perform slow-start - ( ssthresh = flight size/2, cwnd = 1 )
TCP Reno Trace
(with one dropped segment)
500000
480000
48
Sent Segment
ACK'ed Segment
cwnd
ssthresh
cwnd+ndupacks
44
40
36
32
28
440000
24
20
420000
16
12
400000
8
4
380000
4.9
5.0
5.1
5.2
5.3
Time (s)
5.4
5.5
5.6
0
5.7
MSS
Sequence #
460000
TCP Tahoe & Reno Trace
(with two dropped segments)
500000
480000
Tahoe
Reno
Tahoe & Reno
Sequence #
460000
440000
420000
400000
380000
4.9
5.0
5.1
5.2
5.3
Time(s)
5.4
5.5
5.6
5.7
What if There are Multiple
Losses in a Window?
• With two losses in a window, Reno will
occasionally timeout.
• With three losses in a window, Reno will usually
timeout.
• With four losses in a window, Reno is guaranteed
to timeout!
• With three or more losses in a window, Tahoe
typically out performs Reno!
TCP Reno Trace
(with two dropped segments)
500000
480000
48
Sent Segment
ACK'ed Segment
cwnd
ssthresh
cwnd+ndupacks
44
40
36
32
28
440000
24
20
420000
16
12
400000
8
4
380000
4.9
5.0
5.1
5.2
5.3
Time (s)
5.4
5.5
5.6
0
5.7
MSS
Sequence #
460000
TCP Variation: TCP NewReno
• 3rd Improvement was TCP NewReno (1995)
– Nagle’s algorithm
– Improved RTO calculation and back-off
– AIMD congestion avoidance with slow-start
– Fast retransmit & modified fast recovery
Modifications to Fast Recovery
– Partial ACKs: An ACK that acknowledges some but not
all the segments that were outstanding at the start of
fast recovery. NewReno interprets this as an indication
of multiple loss.
– If partial ACK received, re-transmit the next lost
segment immediately and set ndupacks = 0 (deflate
send_win).
– Sender remains in fast recovery until all data
outstanding when fast recovery was initiated is ACK’ed.
Additional dupACK’s increase ndupacks.
TCP NewReno Trace
(with two dropped segments)
500000
480000
48
Sent Segment
ACK'ed Segment
cwnd
ssthresh
cwnd+ndupacks
44
40
36
32
28
440000
24
20
420000
16
12
400000
8
4
380000
4.9
5.0
5.1
5.2
5.3
Time(s)
5.4
5.5
5.6
0
5.7
MSS
Sequence #
460000
Tahoe, Reno & NewReno Trace
(with two dropped segments)
500000
480000
NewReno
Reno
Tahoe
Reno & NewReno
All
Sequence #
460000
440000
420000
400000
380000
4.9
5.0
5.1
5.2
5.3
Time (s)
5.4
5.5
5.6
5.7
Is There a Better Way?
• The only way Tahoe, Reno and NewReno can
detect congestion is by creating congestion!
– They carefully probe for congestion by slowly
increasing their sending rate.
– When they find (create), congestion, they cut sending
rate at least in half!
• This slow advance and rapid retreat approach
results in a saw-toothed sending rate, and highly
erratic throughput.
• What if TCP could detect congestion without
causing congestion?
TCP Variation: TVP Vegas
(True Congestion Avoidance)
• Introduced by Brakmo and Peterson (1994)
• Three changes to TCP Reno
– Modified congestion avoidance
• Don’t wait for a timeout, if actual throughput < expected
throughput decrease the congestion window. (AIAD!)
– New retransmission mechanism
• motivation: what if sender never receives 3-dupACKs (due to
lost segments or window size is too small.)
• mechanism: sender does retransmission after a dupACK
received, if RTT estimate > timeout.
– Modified slow start
• motivation: sender tries finding correct window size without causing
a loss.
Vegas vs. NewReno
TCP NewReno throughput with simulated background traffic
TCP Vegas throughput with simulated background traffic
Source: Brakmo and Peterson, TCP Vegas: End to End Congestion Avoidance
on a Global Internet, IEEE JSAC, Vol 13, No. 8, Oct. 1995, pp. 1465 – 1480
What Variations are being used?
• Experimental results obtained by testing
3728 web servers:
– NewReno
– Tahoe
– Reno
– Other
– Tahoe
42%
27% (w/o Fast Retransmit)
18%
8%
5%
Source: Padhye and Floyd, SIGCOMM ‘01, August 27-31, San Diego, CA
Summary of TCP Behavior
•
TCP
Variatio
n
Response to 3
dupACK’s
Response to Partial
ACK of Fast
Retransmission
Response to “full” ACK
of Fast Retransmission
Tahoe
Do fast
retransmit,
enter slow start
++cwnd
++cwnd
Reno
Do fast
retransmit,
enter fast
recovery
Exit fast recovery,
deflate window, enter
congestion avoidance
Exit fast recovery, deflate
window, enter congestion
avoidance
Do fast
Fast retransmit and
Exit modified fast
retransmit,
NewRen
deflate window –
recovery, deflate window,
When entering slow start, if connection is new, • When entering either fast recovery or
o ssthresh
remain in modified
fastfast recovery,
enter congestion
enter
modified
modified
= arbitrarily
large value
cwnd = fast
1. recovery
size/2, 2*MSS)
recovery ssthresh = max(flight
avoidance
else,
•
ssthresh = max(flight size/2, 2*MSS)
cwnd = 1.
In slow start ++cwnd on new ACK
•
cwnd = ssthresh.
In congestion avoidance
cwnd += 1*MSS per RTT
TCP without SACK
• TCP uses cumulative ACKs
• Receiver identifies the last byte of data successfully received
• Out of order segments are not ACKed
• Receiver sends duplicate ACKs
• TCP without SACK forces the TCP sender
• Either to wait an RTT to find out a segment was lost
• Or, unnecessarily retransmit data that has been correctly received
• Can result in reduced overall throughput
34
TCP with Selective Ack (SACK)
• SACK + Selective Repeat Retransmission Policy allows
• receiver informs sender about all segments that are successfully received.
• sender fast retransmits only the missing data segments
• SACK is implemented using two TCP Options
• SACK-Permitted Option
• SACK Option
35
SACK Example
receiver’s buffer
receiver
sender
1-100 101-200
1-100 101-200
36
401-500 501-600
SACK Rules
• With SACKs, the ACK field is still a cum ACK
• A SACK cannot be sent unless the SACK-Permitted option has been
received (in the SYN)
• The 1st SACK block MUST specify the contiguous block of data containing
the segment which triggered this acknowledgment
• If SACKs are sent, SACK option should be included in all ACK’s which do
not ACK the highest sequence number in the data receiver’s queue
37
Generating SACKs – data receiver behavior
• If the data receiver has not received a SACK-Permitted Option for a given
connection, the receiver must not send SACK options on that connection
• The receiver should send an ACK for every valid segment that arrives
containing new data
• The data receiver should include as many distinct SACK blocks as possible
in the SACK option
• SACK option should be filled out by repeating the most recently reported
SACK blocks
• The data receiver provides the sender with the most up-to-date info about
the state of the network and the receiver’s queue
38
Interpreting SACKs - Data Sender behavior
• The sender records the SACK for future reference
• Maintains a retransmission queue containing unacknowledged segments
• One possible implementation
• Turns on SACK bit for the segment in retransmission queue when it receives a SACK
• Skips SACKed data during any later fast retransmission
• On fast retransmit, retransmits data not SACKed so far and less than the highest
SACKed data
• Turns off SACK bit after retransmission time out
39
Another SACK Example
Receiver Buffer
100 299
699
300
699
receiver
sender
300 500
40
500
900
1099
Another SACK Example
300
500
699
900
(cont’d)
1099
receiver
sender
300 500
699
900
1099
300 500
700
900
1099
1100
41
Without SACK vs. With SACK
TCP with SACK
TCP without SACK
fast retransmit
42
receiver
sender
receiver
sender
fast retransmit
SACK Observations
• SACK TCP follows standard TCP congestion control; Adding SACK to TCP
does not change the basic underlying congestion control algorithms
• SACK TCP has major advantages when compared TCP Tahoe, Reno,
Vegas and New Reno, as PDUs have been provided with additional
information due to the SACK
• Difference in behavior when multiple packets are dropped from one window
of data
• SACK information allows the sender to better decide what to retransmit and
what not to
43
Any questions ? …
Data Receiver Reneging
Reneging – fail to fulfill a promise or obligation
• Data receiver is permitted to discard data in its queue that has not been
acknowledged to the data sender, even if the data has already been SACKed
• Such discarding of SACKed segments is discouraged, but may occur if the
receiver must give buffer space back to the OS
• If reneging occurs
• first SACK should reflect the newest segment even if its going to be discarded
• Except for the newest segment, all SACK blocks MUST NOT report any old data
which is no longer actually held by the receiver
45
Reneging Example
100
199
200 300
receiver
sender
200
200
200
46
399
reneg occurs; window decreases
window increases
500
599
Consequences of Reneging
• Sender must maintain normal TCP timeouts
• Data cannot be considered “communicated” until a cum ACK is sent
• Sender must retransmit the data at the left window edge after a retransmit
timeout, even if that data has been SACKed by the receiver
• Sender MUST NOT discard data before being acked by the Cum Ack
47
SACK-Permitted Option
• Sack–Permitted option
• is allowed only in a SYN Segment.
• indicates sender handles SACKs, and receiver should send SACKs if possible.
• SACK option can be used once connection is established
Source port address
TCP
header
length
Destination port address
Sequence Number
Cumulative Ack No.
Window size
1
6
Checksum
SYN bit
TCP Header
kind=4
Urgent pointer
length=2
SACKpermitted
48
kind=1
kind=1
NOP
NOP
options
SACK-Permitted Option and SACK
RECEIVER
SENDER
TCP
connection
establishment
phase
Source
Sourceport
portaddress
address
Destination
Destinationport
portaddress
address
Source port address
Sequence
Number
SequenceDestination
Number port address
data
transfer
phase
SYN
bit
ACK bit
ACK bit
Sequence Ack
Number
Cumulative
Cumulative
AckNo.
No.
AckWindow
No.
11
size
Window
size
1Cumulative
Window
size
Checksum
Urgent
Checksum
Urgentpointer
pointer
1
Checksum
kind=4
kind=4
length=2
length=2
SACKSACK-
49
Urgent
kind=1
kind=1
kind=1 pointer
kind=1
kind=1
NOP
NOP
kind=1
NOP
NOP
options
options
SACK Option
Source port address
Destination port address
= (2 + 8 * n) bytes
Sequence
Number Ack No.
Cumulative
HLEN
Window size
Checksum
Kind=1
Kind=1
• Length of SACK with n blocks?
Urgent pointer
Kind=5
Length=??
• Max number bytes available
for TCP Options?
= 40 bytes
• Max number of SACK blocks
Left edge of 1st block
possible?
= 4 SACK blocks
(barring no other TCP
Options)
Right edge of 1st block
Left edge of nth block
Right edge of nth block
50

TCP Variations - University of Delaware

Transcript TCP Variations - University of Delaware

Directory