TCP and Congestion Control October 7-9, 2003 10/7/2003-10/9/2003

Download Report

Transcript TCP and Congestion Control October 7-9, 2003 10/7/2003-10/9/2003

TCP and Congestion Control
October 7-9, 2003
10/7/2003-10/9/2003
Assignments
• Finish chapter 3
• Finish up your project!
10/7/2003-10/9/2003
TCP: Overview
• Point-to-Point
– one sender, one receiver
• Connection-oriented
– handshaking (exchange of control msgs)
•
•
•
•
Reliable, in order data delivery
Pipelined
Full duplex data
Flow controlled
– sender will not overwhelm receiver
10/7/2003-10/9/2003
TCP Example
socket
door
application
writes data
application
reads data
TCP
send buffer
TCP
receive buffer
socket
door
segment
• Application data placed in send buffer
• Data taken from send buffer and placed in
segment (data + header)
– Max amount of data is MSS
• Data placed in receive buffer
• App reads from receiver buffer
• Applet
10/7/2003-10/9/2003
TCP Segment Structure
URG: urgent data
(generally not used)
ACK: ACK #
valid
PSH: push data now
(generally not used)
RST, SYN, FIN:
connection estab
(setup, teardown
commands)
Internet
checksum
(as in UDP)
10/7/2003-10/9/2003
32 bits
source port #
dest port #
sequence number
acknowledgement number
head not
UA P R S F
len used
checksum
Receive window
Urg data pnter
Options (variable length)
application
data
(variable length)
counting
by bytes
of data
(not segments!)
# bytes
rcvr willing
to accept
Seq #’s and ACKs
• Seq #’s
– Byte stream “number”
of first byte in
segment’s data
– First is randomly
chosen
• ACKs
– Seq # of next byte
expected from other
side
– Cumulative ACK
– Piggybacking
Host A
User
types
‘C’
Host B
host ACKs
receipt of
‘C’, echoes
back ‘C’
host ACKs
receipt
of echoed
‘C’
simple telnet scenario
10/7/2003-10/9/2003
time
Round Trip Time and Timeout
• TCP uses timeout mechanism
• How can we determine the timeout value?
– Must be longer than RTT
– Too short?
– Too long?
• Take SampleRTT – but RTT varies
10/7/2003-10/9/2003
Round Trip Time and Timeout
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT
• Exponential weighted moving average
• Influence of past sample decreases
exponentially fast
• Typical value:  = 0.125
10/7/2003-10/9/2003
Example RTT Estimation
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
350
RTT (milliseconds)
300
250
200
150
100
1
8
15
22
29
36
43
50
57
64
71
time (seconnds)
SampleRTT
10/7/2003-10/9/2003
Estimated RTT
78
85
92
99
106
Round Trip Time and Timeout
TimeoutInterval = EstimatedRTT + 4*DevRTT
• Timeout also takes into account how much the
EstimatedRTT deviates from the SampleRTT
10/7/2003-10/9/2003
TCP Reliable Data Transfer
• TCP creates rdt service on top of IP’s unreliable
service
– Reliable, in order delivery
• Pipelined segments
• Cumulative acks
• TCP uses single retransmission timer
– Retransmit triggered by timeout and duplicate ACK
• Consider simplified design
– No duplicate ACKs, flow control, or congestion control
10/7/2003-10/9/2003
TCP Sender Events
• Data received from app
– Create segment with seq #
– seq # is byte-stream number of first data byte in segment
– Start timer if not already running (think of timer as for oldest
unacked segment)
– Expiration interval: TimeOutInterval
• Timeout
– Retransmit segment that caused timeout
– Restart timer
• Ack rcvd
– If acknowledges previously unacked segments
• update what is known to be acked
• start timer if there are outstanding segments
10/7/2003-10/9/2003
Retransmit Scenarios
Host A
X
loss
Sendbase
= 100
SendBase
= 120
SendBase
= 100
time
SendBase
= 120
lost ACK scenario
10/7/2003-10/9/2003
Host B
Seq=92 timeout
Host B
Seq=92 timeout
timeout
Host A
time
premature timeout
Retransmit Scenarios
timeout
Host A
Host B
X
loss
SendBase
= 120
time
Cumulative ACK scenario
10/7/2003-10/9/2003
TCP ACK Generation
Event at Receiver
TCP Receiver action
Arrival of in-order segment with
expected seq #. All data up to
expected seq # already ACKed
Delayed ACK. Wait up to 500ms
for next segment. If no next segment,
send ACK
Arrival of in-order segment with
expected seq #. One other
segment has ACK pending
Immediately send single cumulative
ACK, ACKing both in-order segments
Arrival of out-of-order segment
higher-than-expect seq. # .
Gap detected
Immediately send duplicate ACK,
indicating seq. # of next expected byte
Arrival of segment that
partially or completely fills gap
Immediate send ACK, provided that
segment starts at lower end of gap
10/7/2003-10/9/2003
Fast Retransmit
• Time-out period often relatively long
– long delay before resending lost packet
• Detect lost segments via duplicate ACKs
– Sender often sends many segments back-to-back
– If segment is lost, there will likely be many duplicate
ACKs
• If sender receives 3 ACKs for the same data, it
supposes that segment after ACKed data was
lost
– fast retransmit: resend segment before timer expires
10/7/2003-10/9/2003
TCP Flow Control
• Need to keep sender from sending too fast for
receiver
• Match sending rate with receiver “drain” rate
10/7/2003-10/9/2003
Operation
RcvWindow = RcvBuffer-[LastByteRcvd - LastByteRead]
• Rcvr advertises spare room by including value of
RcvWindow in segments
• Sender limits unACKed data to RcvWindow
– guarantees receive buffer doesn’t overflow
• Sender actually continues to send 1-byte pkts after
receiver window is full – WHY?
10/7/2003-10/9/2003
TCP Connection Management
• Sender and receiver establish “connection”
before exchanging data segments
• Initialize TCP variables
– seq. #s
– buffers, flow control info (e.g. RcvWindow)
• Client initiates connection
– Socket clientSocket = new Socket("hostname","port
number");
• Server accepts connection
– Socket connectionSocket = welcomeSocket.accept();
10/7/2003-10/9/2003
TCP Connection Management
• Three way handshake
– Step 1: client host sends TCP SYN segment to server
• specifies initial seq #
• no data
– Step 2: server host receives SYN, replies with
SYNACK segment
• server allocates buffers
• specifies server initial seq. #
– Step 3: client receives SYNACK, replies with ACK
segment, which may contain data
10/7/2003-10/9/2003
Three-way Handshake
client
Connection
Request
ACK
10/7/2003-10/9/2003
server
Connection
Granted
TCP Connection Management
• Closing a connection
– client closes socket: clientSocket.close();
– Step 1: client end system sends TCP FIN control
segment to server
– Step 2: server receives FIN, replies with ACK. Closes
connection, sends FIN.
– Step 3: client receives FIN, replies with ACK.
– Enters “timed wait” - will respond with ACK to
received FINs
– Step 4: server, receives ACK. Connection closed.
10/7/2003-10/9/2003
Teardown
client
server
close
timed wait
close
closed
10/7/2003-10/9/2003
Principles of Congestion Control
• Congestion
– informally: “too many sources sending too
much data too fast for network to handle”
– different from flow control!
• Manifestations
– lost packets (buffer overflow at routers)
– long delays (queueing in router buffers)
10/7/2003-10/9/2003
Scenario 1: Queuing Delays
• two senders, two
receivers
• one router, infinite
buffers
• no retransmission
Host A
Host B
lout
lin : original data
unlimited shared
output link buffers
• large delays
when congested
• maximum
achievable
throughput
10/7/2003-10/9/2003
Scenario 2: Retransmits
• One router, finite buffers
• Sender retransmission of lost packet
– Original packet dropped
– Original packet delayed
Host A
Host B
10/7/2003-10/9/2003
lin : original
data
l'in : original data, plus
retransmitted data
finite shared output
link buffers
lout
Scenario 3: Congestion Near
Receiver
• four senders
• multihop paths
• timeout/retransmit
Host A
lin : original data
l'in : original data, plus
retransmitted data
finite shared output
link buffers
Host B
10/7/2003-10/9/2003
lout
Approaches
• End-end congestion
control
– no explicit feedback
from network
– congestion inferred
from end-system
observed loss, delay
– approach taken by
TCP
10/7/2003-10/9/2003
• Network-assisted
congestion control
– routers provide
feedback to end
systems
• single bit indicating
congestion (SNA,
DECbit, TCP/IP ECN,
ATM)
• explicit rate sender
should send at
TCP Congestion Control
• End-end control (no network assistance)
• Sender limits transmission
LastByteSent-LastByteAcked  CongWin
• CongWin is dynamic, function of perceived
network congestion
• Different than the RcvWindow
10/7/2003-10/9/2003
TCP Congestion Control
• How does sender perceive congestion?
– loss event = timeout or 3 duplicate acks
– Is this always a good measure?
– TCP sender reduces rate (CongWin) after
loss event
• Three components of CC algorithm
– AIMD
– slow start
– conservative after timeout events
10/7/2003-10/9/2003
TCP AIMD – Congestion Avoidance
• Additive Increase
– increase CongWin
by 1 MSS every RTT
in the absence of
loss events: probing
• Multiplicative
Decrease
– cut CongWin in half
after loss event
congestion
window
24 Kbytes
16 Kbytes
8 Kbytes
time
10/7/2003-10/9/2003
Long-lived TCP connection
TCP Slow Start
• When connection begins, CongWin = 1 MSS
– Example: MSS = 500 bytes & RTT = 200 msec
– initial rate = 20 kbps
• Available bandwidth may be >> MSS/RTT
– desirable to quickly ramp up to respectable rate
• When connection begins, increase rate
exponentially fast until first loss event
– Double CongWin every RTT until a loss
– After loss – reduce CongWin by ½ and increase
linerarly
10/7/2003-10/9/2003
Slow Start Example
• Double CongWin
every RTT
Host B
RTT
Host A
– done by incrementing
CongWin for every
ACK received
time
10/7/2003-10/9/2003
Timeout Events
• After 3 dup ACKs
– CongWin is cut in half
– window then grows linearly
• But, after timeout event
– CongWin instead set to 1 MSS
– window then grows exponentially to a threshold (1/2
previous CongWin size), then grows linearly
• Idea – 3 dup ACKs indicates network capable of
delivering some segments
– timeout before 3 dup ACKs is “more alarming”
10/7/2003-10/9/2003
Threshold
• Threshold
• TCP Tahoe vs TCP
Reno
14
congestion window size
(segments)
– Starts at 65K
– Set to ½ CongWin
when loss event
occurs
12
10
8
6
4
2
0
1
2 3
4 5
6 7
8 9 10 11 12 13 14 15
Transmission round
Series1
10/7/2003-10/9/2003
Series2
Summary: TCP Congestion Control
• When CongWin is below Threshold, sender in
slow-start phase, window grows exponentially
• When CongWin is above Threshold, sender is in
congestion-avoidance phase, window grows
linearly
• When a triple duplicate ACK occurs, Threshold
set to CongWin/2 and CongWin set to Threshold
• When timeout occurs, Threshold set to
CongWin/2 and CongWin is set to 1 MSS
10/7/2003-10/9/2003