CS352-Internet Technology

Download Report

Transcript CS352-Internet Technology

TCP and UDP
The Internet Transport Layer

Two transport layer protocols supported by
the Internet:

Reliable:


The Transport Control Protocol (TCP)
Unreliable

The Unreliable Datagram Protocol (UDP)
2
UDP


UDP is an unreliable transport protocol that
can be used in the Internet
UDP does not provide:




connection management
flow or error control
guaranteed in-order packet delivery
UDP is almost a “null” transport layer
3
Why UDP?




No connection needs to be set up
Throughput may be higher because UDP packets
are easier to process, especially at the source
The user doesn’t care if the data is transmitted
reliably
The user wants to implement his or her own
transport protocol
4
UDP Frame Format
32 bits
Source Port
Destination Port
UDP length
UDP checksum (optional)
Data
5
UDP checksum
Goal: detect “errors” (e.g., flipped bits) in transmitted
segment
Sender:



treat segment contents as
sequence of 16-bit integers
checksum: 1’s complement of
(1’s complement sum of
segment contents)
sender puts checksum value
into UDP checksum field
Receiver:


compute checksum of received
segment
check if computed checksum
equals checksum field value:


NO - error detected
YES - no error detected.
But maybe errors
nonetheless? More later
….
6
Internet Checksum Example

Note
 When adding numbers, a carryout from the most
significant bit needs to be added to the result

Example: add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
Wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
the carry
sum
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
Checksum
1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
(complement)
7
TCP

TCP provides the end-to-end reliable
connection that IP alone cannot support

The protocol





Frame format
Connection management
Retransmission
Flow control
Congestion control
8
TCP Frame Format
32 bits
Source Port
Destination Port
Sequence Number
Acknowledgement number
U A P R S F
R C S S Y I
G K H T N N
HL
Checksum
Window Size
Urgent Pointer
Options (0 or more 32-bit words)
Data
9
TCP Frame Fields

Source & Destination Ports


Sequence number


16 bit port identifiers for each packet
The packet’s unique sequence ID
Acknowledgement number

The sequence number of the next packet expected by the
receiver
10
TCP Frame Fields

Window size


Specifies how many bytes may be sent after the first
acknowledged byte
Checksum


(cont’d)
Checksums the TCP header and IP address fields
Urgent Pointer

Points to urgent data in the TCP data field
11
TCP Frame Fields

(cont’d)
Header bits






URG = Urgent pointer field in use
ACK = Indicates whether frame contains
acknowledgement
PSH = Data has been “pushed”. It should be
delivered to higher layers right away.
RST = Indicates that the connection should be
reset
SYN = Used to establish connections
FIN = Used to release a connection
12
TCP Connection Establishment

Three-way Handshake
Host A
Host B
13
TCP Connection Tear-down

Two double handshakes:
Host A
Host B
A->B
torn down
B->A
torn down
14
TCP Retransmission



When a packet remains unacknowledged for a
period of time, TCP assumes it is lost and
retransmits it
TCP tries to calculate the round trip time (RTT) for a
packet and its acknowledgement
From the RTT, TCP can guess how long it should
wait before timing out
15
Round Trip Time (RTT)
Time for data to arrive
Network
Time for ACK to return
RTT = Time for packet to arrive at destination
+
Time for ACK to return from destination
16
RTT Calculation
Sender
Receiver
0.9 sec
RTT
2.2 sec
RTT = 2.2 sec - 0.9 sec. = 1.3 sec
17
Smoothing the RTT
measurement

First, we must smooth the round trip time due
to variations in delay within the network:
SRTT = a SRTT + (1-a) RTTarriving ACK


The smoothed round trip time (SRTT)
weights previously received RTTs by the a
parameter
a is typically equal to 0.875
18
Retransmission Timeout
Interval (RTO)

The timeout value is then calculated by
multiplying the smoothed RTT by some factor
(greater than 1) called b
Timeout = b * SRTT

This coefficient of b is included to allow for
some variation in the round trip times.
19
Example
Initial SRTT = 1.50
a = 0.875, b = 4.0
RTT Meas.
SRTT
Timeout
1.5 s
= 1.50
= b´1.50 = 6.00
1.0 s
= 1.50´a + 1.0´(1- a) = 1.44
= b´1.44 = 5.76
2.2 s
= 1.44´a + 2.2´(1- a) = 1.54
= b´1.54 = 6.16
1.0 s
= 1.54´a + 1.0´(1- a) = 1.47
= b´1.47 = 5.88
0.8 s
= 1.47´a + 0.8´(1- a) = 1.39
= b´1.39 = 5.56
3.1 s
2.0 s
20
Problem with RTT Calculation
Sender
Receiver
Sender Timeout
RTT?
RTT?
21
Karn’s Algorithm

Retransmission ambiguity




Either way there is a problem in RTT estimate
One solution


Measure RTT from original data segment
Measure RTT from most recent segment
Never update RTT measurements based on
acknowledgements from retransmitted packets
Problem: Sudden change in RTT can cause system
never to update RTT

Primary path failure leads to a slower secondary path
22
Karn’s algorithm




Use back-off as part of RTT computation
Whenever packet loss, RTO is increased by a factor
Use this increased RTO as RTO estimate for the
next segment (not from SRTT)
Only after an acknowledgment received for a
successful transmission is the timer set to new RTT
obtained from SRTT
23
Another Problem with RTT
Calculation

RTT measurements can sometimes fluctuate
severely


smoothed RTT (SRTT) is not a good reflection of roundtrip time in these cases
Solution: Use Jacobson/Karels algorithm:
Error =RTT - SRTT
SRTT ==SRTT +(a ´ Error)
Dev ==Dev + h(|Error| - Dev)
Timeout = SRTT+ (b ´ Dev)
24
Jacobson/Karels Algorithm
Example
Initial SRTT = 1.50, Dev = 0
a = 0.125, d = 0.25, b = 4.0
Error = RTT - SRTT
SRTT = SRTT + (a ´ Error)
Dev = Dev + [d ´ (|Error| - Dev)]
Timeout = SRTT + (b ´ Dev)
RTT Meas.
Error
SRTT
Dev.
Timeout
1.5 s
= 0.0
= 1.50
= 0.00
= 1.50
1.0 s
= -0.50
= 1.44
= 0.13
= 1.94
2.2 s
= +0.76
= 1.54
= 0.28
= 2.67
1.0 s
= -0.54
= 1.47
= 0.35
= 2.85
0.8 s
= -0.67
= 1.39
= 0.43
= 3.09
3.1 s
2.0 s
25
Example RTT computation
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
350
RTT (milliseconds)
300
250
200
150
100
1
8
15
22
29
36
43
50
57
64
71
78
85
92
99
106
time (seconnds)
SampleRTT
Estimated RTT
26
TCP Flow Control



TCP uses a modified version of the sliding
window
In acknowledgements, TCP uses the
“Window size” field to tell the sender how
many bytes it may transmit
TCP uses bytes, not packets, as sequence
numbers
27
TCP Flow Control (cont’d)
Important information in TCP/IP packet headers
N
SEQ
Send
ACK WIN
Recv
Contained in IP header
Number of
bytes in packet (N)
ACK bit
set
Sequence number
of first data byte in
packet (SEQ)
Sequence number
of next expected
byte (ACK)
Window size
at the receiver
(WIN)
Contained in TCP header
28
Example TCP session
(1)remus:$ tcpdump -S host scully
Kernel filter, protocol ALL, datagram packet socket
tcpdump: listening on all devices
15:15:22.152339 eth0 > remus.4706 > scully.echo: S 1264296504:1264296504(0) win 32120
<mss 1460,sack OK,timestamp 71253512 0,nop,wscale 0>
15:15:22.153865 eth0 < scully.echo > remus.4706: S 875676030:875676030(0) ack
1264296505 win 8760 <mss 1460>15:15:22.153912 eth0 > remus.4706 > scully.echo: .
1264296505:1264296505(0) ack 875676031 win 32120
remus: telnet scully 7
A <return>
A
29
Example TCP session
Timestamp
Source IP/port
Dest IP/port
Packet 1: 15:15:22.152339
eth0 > remus.4706 > scully.echo: S
1264296504:1264296504(0) win 32120 <mss 1460,sackOK,timestamp
71253512 0,nop,wscale 0> (DF)
Flags
Packet 2: 15:15:22.153865
eth0 < scully.echo > remus.4706: S
875676030:875676030(0) ack 1264296505 win 8760 <mss 1460)
Options
Packet 3: 15:15:22.153912
eth0 > remus.4706 > scully.echo: .
1264296505:1264296505(0) ack 875676031 win 32120
Window
Start Sequence
Number
End Sequence
Number
Acknowledgement
Number
30
TCP data transfer
Packet 4:15:15:28.591716
eth0 > remus.4706 > scully.echo: P
1264296505:1264296508(3) ack 875676031 win 32120
data
Packet 5: 15:15:28.593255
eth0 < scully.echo > remus.4706: P
875676031:875676034(3) ack 1264296508 win 8760
# bytes
31
TCP Flow Control
Sender
Application
does a 2K
write
(cont’d)
Receiver Receiver’s
buffer
0
4K
Empty
2K
Application
does a 3K
write
Full
Sender is
blocked
Application
reads 2K
2K
Sender may
send up to 2K
1K
2K
32
TCP Flow Control (cont’d)
Piggybacking: Allows more efficient bidirectional
communication
Data from
A to B
N
SEQ
ACK for data
from B to A
ACK WIN
A
B
N
SEQ
Data from
B to A
ACK WIN
ACK for data
from A to B
33
TCP Congestion Control



Recall: Network layer is responsible for
congestion control
However, TCP/IP blurs the distinction
In TCP/IP:


the network layer (IP) simply handles routing and
packet forwarding
congestion control is done end-to-end by TCP
34
Self-Clocking Model
Fast link
Bottleneck link
Pr
Pb
1. Send Burst
5. Send a data
packet
2. Receive data packet
Data
Receiver
Sender
Acks
4. Receive
Acknowledgement
3. Send
Acknowledgement
Ab
Ar
Ar
Given: Pb = Pr = Ar =Ab =Ar (in units of time)
Sending a packet on each ACK keeps the bottleneck link busy
35
Changing bottleneck bandwidth


one router, finite buffers
sender retransmission of lost packet
Host A
Host B
lin : original
data
l'in : original data, plus
retransmitted data
lout
finite shared output
link buffers
36
TCP Congestion Control

Goal: achieve self-clocking state



Even if don’t know bandwidth of bottleneck
Bottleneck may change over time
Two phases to keep bottleneck busy:

Slow-start ramps up to the bottleneck limit


Packet loss signals we passed bandwidth of
bottleneck
Congestion Avoidance tries to maintain self
clocking mode once established
37
TCP Congestion Window

TCP introduces a second window, called the
“congestion window”

This window maintains TCP’s best estimate of
amount of outstanding data to allow in the network
to achieve self-clocking
38
TCP Congestion Window


To determine how many bytes it may send,
the sender takes the minimum of the receiver
window and the congestion window
Example:


If the receiver window says the sender can
transmit 8K, but the congestion window is only
4K, then the sender may only transmit 4K
If the congestion window is 8K but the receiver
window says the sender can transmit 4K, then the
sender may only transmit 4K
39
TCP Slow Start Phase


TCP defines the “maximum segment size” as
the maximum size a TCP packet can be
(including header)
TCP Slow Start:



Congestion window starts small, at 1 segment
size
Each time a transmitted segment is
acknowledged, the congestion window is
increased by one maximum segment size
On each ack, cwnd=cwnd +1
40
TCP Slow Start
Congestion
Window Size
1K
2K
4K
8K
16K
(cont’d)
Event
A sends 1 segment to B
B ACKs the segment
A sends 2 segments to B
B ACKs both segments
A sends 4 segments to B
B ACKs all four segments
A sends 8 segments to B
B ACKs all eight segments
… and so on
41
TCP Slow Start
(cont’d)

Congestion window size grows exponentially (i.e. it
keeps on doubling)

Packet losses indicate congestion
Packet losses are determined by using timers at the
sender
When a timeout occurs, the congestion window is
reduced to one maximum segment size and
everything starts over


42
TCP Slow Start
When connection
begins, increase rate
exponentially until first
loss event:



Host A
Host B
RTT

double CongWin every
RTT
done by incrementing
CongWin for every ACK
received
Summary: initial rate is
slow but ramps up
exponentially fast
time
43
TCP Slow Start (cont’d)
Timed out Transmissions
Congestion
window
1 Maximum
Segment Size
Transmission
Number
44
TCP Slow Start



(cont’d)
TCP Slow Start by itself is inefficient
Although the congestion window builds
exponentially, it drops to 1 segment size every time
a packet times out
This leads to low throughput
45
TCP Linear Increase Threshold


Establish a threshold at which the rate increase is
linear instead of exponential to improve efficiency
Algorithm:



Start the threshold at 64K (ssthresh)
Slow start
Once the threshold is passed, only increase the congestion
window size by 1 segment size for each congestion
window of data transmitted


For each ack received, cwnd = cwnd +
(mss*mss)/cwnd
If a timeout occurs, reset the congestion window size
to 1 segment and set threshold to max(2*mss,1/2 of
MIN(sliding window, congestion window))
46
TCP Linear Increase Threshold
Phase
Example: Maximum segment size = 1K
Assume SSthresh=32K
Congestion
window
40K
Timeout occurs when
MIN(sliding window, congestion window) = 40K
Thresholds
32K
20K
1K
Transmission
Number
47
TCP Fast Retransmit




Another enhancement to TCP congestion control
Idea: When sender sees 3 duplicate ACKs, it
assumes something went wrong
The packet is immediately retransmitted instead of
waiting for it to timeout
Why?



Note that acks sent by the receiver when it receives a
packet
Dup ack implies something is getting through
Better than time out
48
TCP Fast Retransmit
Example
MSS = 1K
Sender
Receiver
ACK of new data
Duplicate ACK #1
Duplicate ACK #2
Fast Retransmit
occurs (2nd packet is now
retransmitted w/o waiting
for it to timeout)
Duplicate ACK #3
49
TCP Fast Recovery



Yet another enhancement to TCP congestion control
Idea: Don’t do a slow start after a fast retransmit
Instead, use this algorithm:




Drop threshold to max(2*mss,1/2 of MIN(sliding window,
congestion window))
Set congestion window to threshold + 3 * MSS
For each duplicate ACK (after the fast retransmit),
increment congestion window by MSS
When next non-duplicate ACK arrives, set congestion
window equal to the threshold
50
TCP Fast Recovery
Example
Sender
Continuing with the
Fast Retransmit
Example...
SW=29K,TH=15K, CW=20K
SW=28K,TH=15K, CW=20K
Fast Retransmit
Occurs
MSS=1K
Sliding Window (SW)
Congestion Threshold (TH)
Congestion Window (CW)
SW=28K, TH=10K, CW=13K
SW=27K, TH=10K, CW=14K
SW=26K, TH=10K, CW=10K
51
Resulting TCP Sawtooth
In steady state, window oscillates around the bottleneck’s capacity
(I.e. number of outstanding bytes in transit)
Congestion
window
Slow Start
Bottleneck
Capacity
Linear Mode
40K
32K
Sawtooth
20K
1K
Transmission
Number
52
TCP Recap

Timeout Computation


Timeout is a function of 2 values
 the weighted average of sampled RTTs
 The sampled variance of each RTT
Congestion control:


Goal: Keep the self-clocking pipe full in spite of changing
network conditions
3 key Variables:
 Sliding window (Receiver flow control)
 Congestion window (Sender flow control)
 Threshold (Sender’s slow start vs. linear mode line)
53
TCP Recap (cont)

Slow start

Add 1 segment for each ACK to the congestion
window
-Double’s the congestion window’s volume each RTT

Linear mode (Congestion Avoidance)


Add 1 segment’s worth of data to each congestion
window
Adds 1 segment per RTT
54
Algorithm Summary: TCP Congestion
Control

When CongWin is below Threshold, sender in slow-start
phase, window grows exponentially.

When CongWin is above Threshold, sender is in congestionavoidance phase, window grows linearly.

When a triple duplicate ACK occurs, Threshold set to
max(FlightSize/2,2*mss) and CongWin set to
Threshold+3*mss. (Fast retransmit, Fast recovery)

When timeout occurs, Threshold set to
max(FlightSize/2,2*mss) and CongWin is set to 1 MSS.
FlightSize: The amount of data that has been sent but not yet acknowledged.
55
TCP sender congestion control
Event
State
TCP Sender Action
Commentary
ACK receipt
for previously
unacked data
Slow
Start
(SS)
CongWin = CongWin + MSS,
If (CongWin > Threshold)
set state to “Congestion
Avoidance”
Resulting in a doubling
of CongWin every RTT
ACK receipt
for previously
unacked data
Congesti
on
Avoidanc
e (CA)
CongWin = CongWin+MSS *
(MSS/CongWin)
Additive increase,
resulting in increase of
CongWin by 1 MSS
every RTT
Loss event
detected by
triple duplicate
ACK
SS or CA
Threshold =
max(FlightSize/2,2*mss)
CongWin = Threshold+3*mss,
Set state to “Congestion
Avoidance”
Fast recovery,
implementing
multiplicative
decrease. CongWin
will not drop below 1
MSS.
Timeout
SS or CA
Threshold =
max(FlightSize/2,2*mss),
CongWin = 1 MSS,
Set state to “Slow Start”
Enter slow start
Duplicate ACK
SS or CA
Increment duplicate ACK count
for segment being acked
CongWin and
Threshold not changed
56
TCP Fairness
Fairness goal: if K TCP sessions share same
bottleneck link of bandwidth R, each should
have average rate of R/K
TCP connection 1
TCP
connection 2
bottleneck
router
capacity R
57
Why is TCP fair?
Two competing sessions:


Additive increase gives slope of 1, as throughout increases
multiplicative decrease decreases throughput proportionally
R
equal bandwidth share
loss: decrease window by factor of 2
congestion avoidance: additive increase
Connection 1 throughput R
58
Fairness (more)
Fairness and parallel TCP
Fairness and UDP
connections
 Multimedia apps often  nothing prevents app from
do not use TCP
opening parallel connections


Instead use UDP:


do not want rate
throttled by congestion
control
pump audio/video at
constant rate, tolerate
packet loss
Research area: TCP
friendly


between 2 hosts.
Web browsers do this
Example: link of rate R
supporting 9 connections;


new app asks for 1 TCP, gets
rate R/10
new app asks for 11 TCPs,
gets R/2 !
59
Delay modeling
Q: How long does it take to
receive an object from a
Web server after sending
a request?
Ignoring congestion, delay
is influenced by:



TCP connection
establishment
data transmission delay
slow start
Notation, assumptions:




Assume one link between client
and server of rate R
S: MSS (bits)
O: object size (bits)
no retransmissions (no loss, no
corruption)
Window size:


First assume: fixed congestion
window, W segments
Then dynamic window, modeling
slow start
60
Fixed congestion window (1)
First case:
WS/R > RTT + S/R: ACK for
first segment in window
returns before window’s
worth of data sent
delay = 2RTT + O/R
61
Fixed congestion window (2)
Second case:

WS/R < RTT + S/R: wait
for ACK after sending
window’s worth of data
sent
delay = 2RTT + O/R
+ (K-1)[S/R + RTT - WS/R]
62
TCP Delay Modeling: Slow Start (1)
Now suppose window grows according to slow start
Will show that the delay for one object is:
O
S
S

Latency = 2 RTT + + P  RTT +  - (2 P - 1)
R
R
R

where P is the number of times TCP idles at server:
P = min{Q, K - 1}
- where Q is the number of times the server idles
if the object were of infinite size.
- and K is the number of windows that cover the object.
63
TCP Delay Modeling: Slow Start (2)
Delay components:
• 2 RTT for connection
estab and request
• O/R to transmit
object
• time server idles due
to slow start
initiate TCP
connection
request
object
first window
= S/R
RTT
Server idles:
P = min{K-1,Q} times
Example:
• O/S = 15 segments
• K = 4 windows
•Q=2
• P = min{K-1,Q} = 2
Server idles P=2 times
second window
= 2S/R
third window
= 4S/R
fourth window
= 8S/R
complete
transmission
object
delivered
time at
client
time at
server
64
TCP Delay Modeling (3)
S
+ RTT = timefrom when server startstosend segment
R
untilserver receivesacknowledgement
initiate TCP
connection
2k -1
S
= time to transmit the kth window
R
+
request
object
S
k -1 S 
+
RTT
2
R
 = idle timeafter thekth window
R


first window
= S/R
RTT
second window
= 2S/R
third window
= 4S/R
P
O
delay = + 2 RTT +  idleTim ep
R
p =1
P
O
S
S
= + 2 RTT +  [ + RTT - 2 k -1 ]
R
R
k =1 R
O
S
S
= + 2 RTT + P[ RTT + ] - (2 P - 1)
R
R
R
fourth window
= 8S/R
complete
transmission
object
delivered
time at
client
time at
server
65
TCP Delay Modeling (4)
Recall K = number of windows that cover object
How do we calculate K ?
K = min{k : 20 S + 21 S +  + 2 k -1 S  O}
= min{k : 20 + 21 +  + 2 k -1  O / S}
O
= min{k : 2 - 1  }
S
O
= min{k : k  log2 ( + 1)}
S
O


= log2 ( + 1)
S


k
Calculation of Q, number of idles for infinite-size object,
is similar (see HW).
66