Transcript Document

TCP enhancements
Hojun Lee
[email protected]
7/17/2015
1
Many variants of TCP


Tahoe TCP: Follows a basic go-back-n model using slow start,
congestion avoidance and Fast Retransmit algorithm. With Fast
Retransmit, after receiving small number of acks for the same segment,
the sender infers that the packet has been lost and retransmits the
packet without waiting for the retransmission timer to expire
Reno TCP
– Modification to the Tahoe TCP Fast Retransmit algorithm to include Fast
Recovery; this prevents the pipe from going empty after Fast retransmit,
thereby avoiding the need to slow start after a single packet loss
– Recover 1 lost segment every 3 RTTs

New Reno:
– Uses partial acknowledgement to improve loss recovery
– Recovers 1 lost segment every RTT

SACK TCP
– Uses SACK option bit field to improve loss recovery
– Recovers up to 3 segments per RTT
 Other
7/17/2015
schemes exist (e.g., Vegas)
2
Outline

LFN
– Needs some TCP options such as
window scale, timestamp and PAWS

Methods to recover from multiple
packet losses in a window
– SACK
– TCP with “partial acknowledgements”

Effects of increasing window size
7/17/2015
3
Long fat pipes problems

The TCP window size is a 16-bit field in the TCP header,
limiting the window to 65,535 bytes
– Can be solved with “window scale option”

Packet loss in an LFN can reduce throughput drastically
(Possible solutions for multiple packet loss within a
window?)
– SACK (Selective acknowledgements)
– New Reno (use partial acknowledgements)

Better RTT measurements are required for operating on an
LFN
– Timestamp option

If the network is so fast that sequence number wrap occurs
in less than MSL
– PAWS algorithm (Protection Against Wrapped Sequence
7/17/2015 Number)
4
Window scale option

Format
kind = 3
1 byte
len = 4
1 byte
Shift count
1 byte

Increase the definition of TCP window
from 16 to 32 bits
 The 1-bytes shift count is between 0 and
14. The maximum value of 14 is a window
of 1,073,725,440 bytes (65535*214)
 Appears in a SYN segment
7/17/2015
5
Timestamp option

Format
Kind=8 len=10
1 byte
1 byte





timestamp value
4 bytes
timestamp echo reply
4 bytes
Let the sender place a timestamp value in every segment
Receiver echoes this value in the ACK by allowing the
sender to calculate an RTT for each received ACK
Uses in SYN segment
Larger window sizes require better RTT calculation
Does not require any form of clock synchronization
between the two hosts
7/17/2015
6
PAWS



Largest receiver window = 230 = 1 GB
“Lost” segment may reappear before MSL, and
the sequence numbers may have wrapped
around.
The receiver considers the timestamp as an
extension of the sequence number  discard
out-of- sequence segment based on both seq #
and timestamp.
7/17/2015
7
Useful terms



LW (Loss Window): size of the congestion window
after a TCP sender detects loss using its
retransmission timer
RW (Restart Window): size of the congestion window
after a TCP restarts transmission after an idle period
Flight Size: The amount of data that has been sent
but not yet acknowledged
7/17/2015
8
Two methods to detect segment
loss
TO (Timeout )
 TD (Triple Duplicate ACK)

7/17/2015
9
Detect a loss by TO


Set cwnd = 1
Set ssthresh = max(Flight size/2, 2MSS)
7/17/2015
10
Detect a loss by TD (Fast Retransmit
and Fast Recovery procedure)


After receiving 3
duplicate ACKS, TCP
performs a
retransmission of what
appears to be the
missing segment,
without waiting for the
retransmission timer to
expire
Good for a single loss
within a window but not
good for multiple losses
7/17/2015
ack 10
10
X
11
12 ack 10
13 ack 10
ack 10
TD
10
ack 14
11
Fast retransmit and fast recovery
algorithms
1.
2.
3.
4.
5.
When the third duplicate ACK is received, then
set ssthresh = max(Flight size/2, 2MSS)
Retransmit the lost segment and then
set cwnd = ssthresh + 3*MSS (Inflating the window)
 The reason for this is that since the three duplicate ACKS are
received, it assumes that three segments got through because
according to the TCP rule, if the receiver receives a new packet, it
must generate an ACK.
For each additional duplicate ACK received, increase cwnd by 1.
Transmit a segment, if allowed by min(cwnd, receiver’s AW)
When the next ACK arrives that acknowledges new data,
set cwnd = ssthresh (the value set in step 1)
(deflating the window)
7/17/2015
12
Problem occurs if multiple packets
loss happens within a window

ack 10
Two possible answers
– TCP with SACK
– TCP with partial
acknowledgement
(New Reno TCP)
10
11
12
13
14
X
ack 10
X
ack 10
ack 10
TD
2nd TD
7/17/2015
10
partial ack 12
15
16 ack 12
17 ack 12
18
ack 12
13
TCP with SACK (Selective
Acknowledgement)

Based on RFC 2018 (TCP Selective
Acknowledgement Options) –standards track
 Good for when multiple packets are lost from one
window of data
 Gives sender view of which segments queued at
receiver and which in flight
 Uses two TCP options
– “SACK permitted” (may be sent in a SYN segment)
– SACK option itself (may be sent over an established
connection)
7/17/2015
14
Format of two SACK options

Sack-permitted option
– Two bytes option
Kind = 2

Length = 4
Sack option itself (Kind = 5, Length=variable)
Kind = 5
Length
Left edge of 1st block
Right edge of 1st block
Left edge of nth block
Right edge of nth block
7/17/2015
15
SACK option examples
~ Assume the left edge is 5000 and transmitter sends a
burst of 8 segments, each containing 500 data bytes
Transmitter
Example (1) The first four
segments are received but the
last 4 are dropped
- The data receiver will return a
normal TCP ACK segment
acknowledging sequence number
7000, with no SACK option
1
2
3
4
Receiver
5000:
5499
5500:
5999
6000:
6499
6500:
6999
5 7
000:7
499
6 750
0 :7 9 9
500:8
ACK 7000
7/17/2015
9
7 8000
:8 4 9 9
8 8
999
x
x
x
x
16
SACK option examples con’t [1]
Transmitter
Example (2) The first
segment is lost but the
remaining 7 segments
are received.
- Receiver will return a TCP
ACK segment that
acknowledges sequence
number 5000 and contains
SACK option specifying
one block of queued data
- LE = Left Edge
- RE = Right Edge
1
2
3
4
5
6
7
8
ACK 5000; LE 5500; RE: 6000
Receiver
5000
:5499
5500:
5999
x
6000:
6499
6500:
6999
7000:
7499
7500:
8000
7999
:8499
8500
:8999
ACK 5000; LE 5500; RE: 6500
ACK 5000; LE 5500; RE: 7000
ACK 5000; LE 5500; RE: 7500
ACK 5000; LE 5500; RE: 8000
ACK 5000; LE 5500; RE: 8500
ACK 5000; LE 5500; RE: 9000
7/17/2015
17
SACK option examples con’t [2]
Transmitter
Example (3)The 2nd, 4th, 6th and 8th
(last) segments are dropped.
First block
Left
Edge
Right
Edge
(a)
Second block
Left
Edge
Right
Edge
Third block
Left
Edge
Right
Edge
SACK not used
(b)
6000
6500
(c)
7000
7500
6000
6500
(d)
8000
8500
7000
7500
1
2
3
4
5
6
7
8
Receiver
5000:
5500:
5499
5999
6000:
6499
6500:
6999
7000:
7499
7500:
7999
8000
:8499
8500
:8999
(a) ACK 5500
x
x
x
x
(b) ACK 5500
6000
6500
(c) ACK 5500
(d) ACK 5500
7/17/2015
18
SACK option examples con’t [3]

Suppose at this point (continue from pervious
example), the 4th packet is received out of order.
– Receiver replies with the following SACK:
ACK
5500

First block
Second block
Left
edge
Right
edge
Left
edge
Right
edge
6000
7500
8000
8500
Third block
Left
edge
Right
edge
Suppose that the 2nd segment is received.
– Receiver replies with the following SACK:
First block
7500
7/17/2015
8000
8500
19
New Reno algorithm





Based on RFC 2582 (The NewReno modification to
TCP’s Fast Recovery Algorithm) - experimental
Little information available to the TCP sender in
making retransmission decision during Fast recovery
Use “partial acknowledgements” (ACKs which cover
new data, but not all the data outstanding when loss
was detected)
Recover 1 lost segment every RTT
NewReno modification to TCP’s Fast Recovery
7/17/2015
20
New Reno algorithm con’t [1]




Refer to slide 12
Variable “recover” – used to record the highest
sequence number transmitted
Add variable “recover” in step 1.
Step 5: when an ACK arrives that acknowledges new
data, this ACK could be the acknowledgement
elicited by the retransmission from step 2, or elicited
by a later transmission
7/17/2015
21
New Reno algorithm con’t [2]

Two possibilities
1.
If this ACK, acknowledges all of the data up to and including
“recover”, then the ACK acknowledges all the intermediate
segments sent between the original retransmission of the lost
segment and the receipt of the third duplicate ACK
Set either cwnd = min (ssthresh, FlightSize + mss)
or
cwnd = ssthresh (set in step 1)
(Flight size (in this case)  amount of data outstanding when the
Fast Recovery is exited)
2.
If this ACK does not acknowledge all of the data up to and
including “recover”, then this is a partial ACK
Set cwnd = “deflate the previous cwnd by the amount of new
data acknowledged” + mss
7/17/2015
22
New Reno algorithm con’t [3]

Possible variants to the simple response to
partial acknowledgements
–
–
How many packets to retransmit after each partial ACK?
When to reset the retransmit timer after a partial ACK?
•
•
–
7/17/2015
Reset the retransmit timer only after the first partial ACK
( Impatient variant of NewReno)
Reset the retransmit timer after each partial ACK
( Slow-but-steady variant of NewReno)
How to avoid multiple Fast Retransmits caused by the
retransmission of packets already received by the
receiver?
23
New Reno algorithm con’t [4]

Avoiding multiple fast retransmit
Reason: TCP data sender is unable to distinguish between a
duplicate ACK that results from a lost or delay data packet,
and a duplicate ACK that results from the sender’s
retransmission of a data packet that had already been
received at the receiver
Needs a new variable called “send_high” = highest sequence
number transmitted so far after each retransmit timeout
7/17/2015
24
New Reno algorithm con’t [5]

Example (a):
Assumption:
When the third
duplicate ACK is
received and the
sender in not
already in the
Fast Recovery
procedure, then
check whether
those duplicate
ACKs cover more
than “send_high”
or not.
7/17/2015
13 X
14
15 ack 13
16 ack 13
Send_high = 12
Sender is not in
the fast recovery
procedure at this
point
10
11 ack 13
12 ack 13
ack 13
ack 13
1st Scenario
2nd Scenario
Wait for RTO
• Set ssthresh = max(flight size/2, 2MSS)
• Set the highest sequence number transmitted
in the variable called “recover”
• Go to step 2
25
New Reno algorithm con’t [6]

Example (b): when the duplicate
ACKs don’t cover “send_high”, then
do nothing.
– Do not enter fast retransmit and fast
recovery procedure
– Do not change the ssthresh value
Send_high = 12
– Do not go to step 2 to retransmit lost
segment
– Do not execute step 3 upon receiving
subsequent duplicate ACKs
– After a retransmit timeout, record the
highest sequence number in
“send_high” and exit the Fast Recovery
procedure if applicable
7/17/2015
13
14 ack 11
15 ack 11
ack 11
26
Increase IW (Initial Window) size

Limited to 2 segments (RFC 2581)
– Standard track RFC (not experimental)

Upper bound for IW is given more precisely:
– IW = min (4*MSS, max(2*MSS,4380 bytes))

4 segments (RFC 2416) - informational
– A simple experiment with only 3 buffers leading
into a 9600 baud modem at the receiver
– No significant degradation of performance even
when the IW size 4
7/17/2015
27
Advantages/disadvantages of
larger IW

Advantages
– When an IW of at least two segments, the receiver will generate an
ACK after the second data segment arrives (eliminates the wait on
the timeout (~200 msec)
– For small file sizes, delay can be improved from 3RTTs down to 1
(Email, webpage transfers less than 4 Kbytes  yields 1RTT)

Disadvantages
– A burst of 4 segments (small burst) may not be “handable” in a
rotuer
– Slightly increase packet drop rate
7/17/2015
28
Re-starting after idle connections

A known problem with TCP congestion control algorithm:
– Potentially in appropriate burst of traffic to be transmitted after TCP
has been idle for a relatively long period of time
(line rate burst occurs – source is idle but also due to ACK losses)
– Idle time; more than one retransmission timeout (RTO)
– When TCP does not receive during the idle time, then
set RW(Restart Window) = IW
7/17/2015
29
Summary
Studied TCP options for LFN,many
variants of TCP such as SACK TCP and
New Reno, and the effect of increasing
IW
 ECN (Explicit Congestion Notification)
with RED (other TCP enhancement)

7/17/2015
30