TCP: Transmission Control Protocol • Overview • Connection set-up and termination

Download Report

Transcript TCP: Transmission Control Protocol • Overview • Connection set-up and termination

TCP: Transmission Control
Protocol
• Overview
• Connection set-up and termination
• Interactive
• Bulk transfer
• Timers
• Improvements
TCP: Overview
• Connection oriented, byte stream service
• Full or half duplex service
• Reliability (ARQ)
– Sliding window with variable sized window
– Stream is sent in segments (IP datagrams)
– SN for bytes
– Receiver buffer reorders bytes
– Checksum on header and data
– Discards duplicate data
– Flow control
TCP: Overview
Data from A
Acks from A
A
B
Data from B
Acks from B
65535-20-20=65495
TCP: Overview
TCP segment
IP Header TCP Header
TCP Data
Source port #
Destination port #
Sequence #
Acknowledgement #
HL reserv flags
Window size
Checksum
Urgent pointer
Options if any
TCP: Flags
• URG: The urgent pointer is used
• ACK: The acknowledgement number is valid
• PSH: The receiver should pass this data to the
application as soon as possible
• RST: Reset the connection
• SYN: Synchronize sequence numbers to initiate
a connection.
• FIN: The sender is finished sending data
TCP: Flags
SYN
When starting a TCP connection this bit is set
Sequence # = Initial sequence number (ISN)
URG byte offset of urgent data are to be found
Urgent Ptr + SN = last byte of urgent data
Options
Example, maximum segment size (MSS)
Max sized segment each end wants to receive
Default = 536 byte payload + 20, SUN 1500
TCP: Set-Up Syn=1 Ack = 0
A:SYN, MSS, SN=ISN
A
Full
duplex
B: SYN, MSS, SN=ISN
Syn1 ack 1
B: ACK
A: ACK
B
TCP: Termination
A:FIN
A
B: ack of FIN
B: FIN
A: ack of FIN
B
Both
sides
close
TCP: Termination, Half Close
appl
shutdown
FIN
ack of FIN
deliver eof
of appl
appl write
data
ack of data
FIN
deliver eof
of appl
ack of FIN
appl
shutdown
TCP
• Reset segments: sent whenever a segment
is received that doesn’t appear correct for
the referenced connection
– To indicate wrong port
– To indicate an abortive release
–16 bit window size
–T3 = 44.736Mbps – 12 msec
–If Rrt = 50ms
–Left Shift up to 14 bit – by agreement
TCP: Interactive Data Flow
Key
stroke
data byte
ack of data byte
echo of data byte
ack of echoed byte
PSH=1
Echo
TCP: Interactive Data Flow
Telnet and rlogin carry small chunks of data
Typically 10 bytes or less
IP header
TCP header
data
= 20 bytes
= 20 bytes
= 1 byte
Inefficient!
Nagle algorithm: Only one outstanding segment
In the meantime bytes are collected.
Stop and wait
TCP: Interactive Data Flow
Delayed ACKs. Acks are delayed approx 200ms
This allows them to be accumulated before
being piggybacked on a segment
Nagle algorithm: Only one outstanding segment
In the meantime bytes are collected.
Stop and wait
TCP: Interactive Data Flow
data
Collect
incoming
bytes
ack of data byte
data
Self clocking
= data rate
is inversely
dependent
on rate at
which acks
return
Nagle Alg.
Default in telnet or rlogin
What about in X windows?
TCP: Bulk Data Flow
• TCP is a sliding window protocol
• What big should the window be?
– The bigger the window, the higher throughput
– Not too big since it’ll swamp resources and
cause packet loss
– Bandwidth delay product
capacity (bits) = bw (bits/s) x round trip time (sec)
• Start from small window
• If under bw, increase window size (probing)
• If over bw (lose packets), decrease window size
(backoff)
Bulk Transfer
Dynamic sliding window
Offered window:
• Advertised by the receiver in segments
• Amount of buffer space at receiver
Congestion window (cwnd):
• Set by sender
• Local to the sender
• Dynamically adjusted to optimize performance
TCP windows
offered window
1
2
3
sent and
acked
4
5
6
sent but
not acked
7
8
can send
asap
usable
window
9
10
11
TCP windows
Actually,
min{offered window from receiver,
cwnd}
usable
window
1
2
3
sent and
acked
4
5
6
sent but
not acked
7
8
can send
asap
9
10
11
Bulk Transfer: cwnd
Congestion window (cwnd) is dynamically adjusted
to optimize performance
Slow Start:
F = # bytes in a frame, set by receiver
• Initially, cwnd = F
• Each time an ack is received, cwnd = cwnd + F
Self-clocking: acks are generated at the same
rate as they are being
received, with the same kind of spacing in time.
cwnd
1
Bulk Transfer: cwnd
1
ack1
2
4
2
3
4
5
6
7
ack2
ack3
Doubling
every RT!
Bandwidth Delay Product
This used to be 1
Recall for sliding window
Throughput R= min{C, W/T}
W = window size
T = round trip time
C = bandwidth
We want R = C, thus W/T = C
W = C x T = bandwidth x delay
TCP Timeout and
Retransmission
Each data has a retransmission timer
• It is initialized by the retransmission time out
(RTO) value
• When the timer expires, a time out occurs and
the data is retransmitted
• If a retransmission fails then the time-out doubles
i.e., exponential backoff.
It’s important to find a good RTO value
TCP RTO
RTO = Rb,
• R = RTT round trip time estimate
• b recommended to be 2
Original Round trip time measurement
• Update: R = axR + (1-a)xM,
• a = fraction, recommendation = .9
• M = measured RTT
Not a good estimator due to high variance in meas.
Jacobson RTO estimate
• M = Measured RTT
• A = Averaged estimate of RTT
• Err = M - A
• D = Averaged |Err| value
• A = A + g*Err
g = 1/8
• D = D +h*(|Err| - D)
• RTO = A + 4D
h = 1/4
Congestion Avoidance
• variables:
• cwnd = current window
• ssthresh = estimate of the “best” window
i.e., largest window that won’t cause loss
• Packet loss is indicated by time out or the receipt of
duplicate acks (3)
1. Initialization: cwnd = bytes for a segment
ssthresh = 65535
Congestion Avoidance
2. When congestion is detected (packet loss detected,
i.e, TO or duplicate ACK):
• ssthresh = max{ current window/2 , 2}
• Additionally, if timeout, cwnd = 1 (begin slow start)
3. When new data is acknowledged
• if cwnd <= ssthresh then slow start
• if cwnd > ssthresh then congestion avoidance
Congestion Avoidance
• Slow Start:
• Upon receiving ack, cwnd++
• Exponentially increasing
• Congestion Avoidance:
• Upon receiving ack, cwnd += 1/cwnd
• Linearly increasing
cwnd 100
100 acks
101
102
101 acks
Algorithm
When to Run
Window
Growth
Slow Start
Congestion
Avoidance
cwnd <=
sstresh
cwnd > sstresh
by 1 segment,
if ACK
received
Rate of Growth Exponentially
1/cwnd,
if ACK
received
Additive
Fast Retransmit and Fast
Recovery
If tree or more duplicate packet, likely to have lost packet
Should we wait RTO?
If one or two duplicate packet, reordered
It is with congestion avoidance rather than slow start
Fast Retransmit and Fast
Recovery
Fast retransmit: Avoid timeouts and slow start.
1. When a third duplicate ack is received
• set ssthresh = current window/2
• retransmit the missing segment
• cwnd = ssthresh + 3 x segment size - avoidance
2. Each time another duplicate ack arrives
• increment cwnd by the segment size
• transmit a packet when window reaches new
packets
Fast Retransmit and Fast
Recovery
2. Each time another duplicate ack arrives
• cwnd = cwnd + segment size
• transmit a packet if window covers new packets
1
2
stuck
3
4
5
6
sent but
not acked
7
8
9
10
11
can send
asap
3. When the ack arrives that acknowledges new data
• cwnd = ssthresh.
TCP
• Slow start: cwnd =1
cwnd exponentially increasing
• Congestion avoidance: cwnd reaches ssthresh,
cwnd linearly increasing
TCP
• 3 dup acks, fast retransmit of packet
old cwnd
Packets
• ssthresh = cwnd/2, cwnd = ssthresh + 3
• cwnd increments by 1 per duplicate ack. Note
no transmissions while cwnd <= old cwnd
Thus, oldcwnd/2 packets are in the pipe
• When an ack for new data arrives,
ssthresh = cwnd and --> congestion avoidance
Additive Increase -Multiplicative Decrease
Helps fairness
R1
C
R2
R1
additive increase
multiplicative decrease
R2
R1+R2 = C
Tends to converge to
R1 = R2
TCP: Tahoe and Reno
Tahoe: slow start + congestion avoidance
Reno: fast retransmit + fast recovery
Predict Congestion?
Improvements: TCP Vegas
DWT
min round trip delay (x = 0)
window size
measured round trip delay
Router (single bottleneck)
x
R = transm
C = transm. rate router
backlog
rate
W = packets in flight + x = RD + x
measured D = min{T}
x = W - RD
Improvements: TCP Vegas
TCP Vegas: keep x at 2 for all flows (x = W - RD)
• if W - RD <= 1 then increase W
• if W - RD >= 3 then decrease W
DWT
Router (single bottleneck)
x
Leads to fair bandwidth allocation at router.
cwnd
E(T)
Back
Traffic
cwnd
E(T)
Back
Traffic
Vegas
Improvements -- TCP Reno
Improvements by Janey Hoe (New Reno):
• sstresh is initially too big: 65K
ssthresh should be estimate of bw x delay
• TCP Reno does not work well with multiple losses
Bandwidth estimate: send three closely spaced
packets. Measure the times between their acks
1/times is approximate measure of bw of
bottleneck link
Delay estimate: Round trip time estimates
Improvements -- TCP Reno
Multiple losses:
• TCP Reno retransmits one packet per RTT under
fast retransmit.
(Note a packet is retransmitted only under
fast retransmit (3 dup acks) or TO/slow start)
Improvement: Fast retransmit
• Sometimes it can get stuck -- window doesn’t
cover new packets, and no more acks.
Improved Fast Retransmit
3 dup acks
window
max packet
sent = sndMax
Packets
(i) ssthresh = cwnd/2
cwnd = 1 segment
save_cwnd = ssthresh+1
(ii) Retransmit everything in using slow start
Upon receiving 2 dup acks, send a new packet
This phase is over when an ack is recvd for sndMax
TCP Timers
• Keep alive timer: periodically transmit a message
(no data) to see if other end is alive. Application
telnet (client just turns off PC)
• Persist timer: If windows go to zero, TCP is stuck
Window probes query receivers to see if
window has increased (1 byte of data
beyond window)
TCP has a 500 ms timer (crude)
Time Outs have exponential backoff.
• Silly window syndrome: avoid small windows