Congestion Control - National Tsing Hua University

Download Report

Transcript Congestion Control - National Tsing Hua University

Chapter 6
Congestion Control and Resource
Allocation
Outline
6.1 Issues in Resource Allocation
6.2 Queuing Discipline
6.3 TCP Congestion Control
6.4 Congestion Avoidance Mechanisms
6.5 Quality of Service
1
6.1 Issues in Resource Allocation
• Two sides of the same coin
– pre-allocate resources so that to avoid congestion
– control congestion if (and when) is occurs
Source
1
10-M
bps
Ethe
rnet
Router
1.5-Mbps T 1 link
Source
2
bps
00-M
Destination
I
FDD
1
• Two points of implementation
– hosts at the edges of the network (transport protocol)
– routers inside the network (queuing discipline)
• Underlying service model
– best-effort (assume for now)
– multiple qualities of service (later)
2
Framework
• Connectionless flows
– sequence of packets sent between source/destination pair
– maintain soft state (state information) at the routers
Source
1
Router
Destination
1
Router
Source
2
Router
Destination
2
Source
3
• Taxonomy for resource allocation mechanisms
– router-centric versus host-centric
– reservation-based versus feedback-based
– window-based versus rate-based
3
Evaluation Criteria
• Effectiveness and Fairness
• Power (ratio of throughput to delay)
Optimal
load
Load
4
6.2 Queuing Discipline
• First-In-First-Out (FIFO)
– does not discriminate between traffic sources
• Fair Queuing (FQ)
– explicitly segregates traffic based on flows
– ensures no flow captures more than its share of capacity
– variation: weighted fair queuing (WFQ)
• Problem?
Flow 1
Flow 2
Round-robin
service
Flow 3
Flow 4
5
FQ Algorithm
•
•
•
•
•
•
•
For single flow
Suppose clock ticks each time a bit is transmitted
Let Pi denote the length of packet i
Let Si denote the time when start to transmit packet i
Let Fi denote the time when finish transmitting packet i
Fi = Si + Pi
When does router start transmitting packet i?
– if before router finished packet i - 1 from this flow, then
immediately after last bit of i - 1 (Fi-1)
– if no current packets for this flow, then start
transmitting when arrives (call this Ai)
• Thus: Fi = MAX (Fi - 1, Ai) + Pi
6
FQ Algorithm (cont)
• For multiple flows
– calculate Fi for each packet that arrives on each flow
– treat all Fi’s as timestamps
– next packet to transmit is one with lowest timestamp
• Not perfect: can’t preempt current packet
• Example
Flow 1
F=8
F=5
Flow 2
Output
Flow 1
(arriving)
F = 10
Flow 2
(transmitting)
Output
F = 10
F=2
(a)
(b)
• The link is never left idle as long as there is at least one packet in the
queue (work-conserving)
• If the link is fully loaded and there are n flows sending data, then I
cannot use more than 1/n of the link bandwidth.
7
6.3 TCP Congestion Control
• Idea
– TCP assumes best-effort network (FIFO or FQ routers)
each source determines network capacity for itself
– uses implicit feedback
– ACKs pace transmission (self-clocking)
• Challenge
– determining the available capacity in the first place
– adjusting to changes in the available capacity
8
6.3.1 Additive Increase/Multiplicative
Decrease (AIMD)
• Objective: adjust to changes in the available capacity
• New state variable per connection: CongestionWindow
– limits how much data source has in transit
MaxWin = MIN(CongestionWindow,
AdvertisedWindow)
EffWin = MaxWin - (LastByteSent LastByteAcked)
• Idea:
– increase CongestionWindow when congestion goes down
– decrease CongestionWindow when congestion goes up
9
AIMD (cont)
• Question: how does the source determine whether
or not the network is congested?
• Answer: a timeout occurs
– timeout signals that a packet was lost
– packets are seldom lost due to transmission error
– lost packet implies congestion
10
AIMD (cont)
Source
Destination
• Algorithm
– increment CongestionWindow by
one packet per RTT (linear increase)
– divide CongestionWindow by two
whenever a timeout occurs
(multiplicative decrease)
• In practice: increment a little for each ACK
Increment = (MSS * MSS)/CongestionWindow
CongestionWindow += Increment
11
AIMD (cont)
• Trace: sawtooth behavior
70
60
50
40
30
20
10
1.0
2.0
3.0
4.0
5.0
6.0
T ime (seconds)
7.0
8.0
9.0
10.0
12
6.3.2 Slow Start
• Objective: determine the available
capacity in the first
• Effectively increases the congestion
window exponentially, rather than
linearly
• Idea:
Source
Destination
– begin with CongestionWindow = 1
packet
– double CongestionWindow each RTT
(increment by 1 packet for each ACK)
• Why called “slow start” ? It is
compared to the original behavior
of TCP, not to the linear mechanism
13
Slow Start (cont)
• Exponential growth, but slower than all at once
• Used in two different situations …
– when first starting connection
– when connection goes dead waiting for timeout
• Trace
70
60
50
40
30
20
10
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
T ime (seconds)
• Colored line = value of CongestionWindow, Solid bullets = timeouts,
Hash marks = time when each packet is transmitted, vertical bars =
time when a packet that was eventually retransmitted was first
transmitted
• Problem: lose up to half a CongestionWindow’s worth of data if the
source is aggressive at the beginning, as TCP is during exponential
growth
14
6.3.3 Fast Retransmit and Fast Recovery
• Problem: coarse-grained
implementation of TCP timeouts
lead to long periods during which
the connection went dead and while
waiting for a timer to expire
• Fast retransmit: use duplicate ACKs
to trigger retransmission
• TCP waits for 3 duplicate ACKs
Sender
Receiver
Packet 1
Packet 2
Packet 3
ACK 1
Packet 4
ACK 2
Packet 5
ACK 2
Packet 6
ACK 2
ACK 2
Retransmit
packet 3
ACK 6
15
Results (Fast Retransmission)
70
60
50
40
30
20
10
1.0
2.0
3.0
4.0
5.0
6.0
7.0
Time (seconds)
• The long periods during which the congestion window stays flat and
no packets are sent have been eliminated
• Eliminated about half of the coarse-grained timeouts
• Fast recovery
– skip the slow start phase that happens between when fast
retransmit detects a lost packet and additive increase begins
– go directly to half the last successful CongestionWindow
– Slow start is only used at the beginning of a connection and whenever a
coarse-grained timeout occurs. At all other times, the congestion
window is following a pure AIMD pattern.
16
6.4 Congestion Avoidance Mechanisms
• TCP’s strategy
– control congestion once it happens, not trying to avoid congestion
in the first place
– repeatedly increase load in an effort to find the point at which
congestion occurs, and then back off
– TCP needs to increase losses to find the available bandwidth of the
connection
• Alternative strategy
– predict when congestion is about to happen
– reduce rate before packets start being discarded
– call this congestion avoidance, instead of congestion control
• Two different mechanisms
– router-centric: DECbit and RED Gateways
– host-centric: TCP Vegas
17
6.4.1 DECbit
• Add binary congestion bit to each packet header
• Router
– monitors average queue length over last busy+idle cycle
Queue length
Current
time
T ime
Previous
cycle
Averaging
interval
Current
cycle
– set congestion bit if average queue length > 1
– attempts to balance throughout against delay
18
End Hosts
• Destination echoes bit back to source
• Source records how many packets resulted in set bit
• If less than 50% of last window’s worth had bit set
– increase CongestionWindow by 1 packet
• If 50% or more of last window’s worth had bit set
– decrease CongestionWindow by 0.875 times
19
6.4.2 Random Early Detection (RED)
• Invented by Sally Floyd and Van Jacobson in
the early 1990s, differs from the DECbit in
two major ways
• Notification is implicit
– just drop the packet (TCP will timeout)
– could make explicit by marking the packet
• Early random drop
– rather than wait for queue to become full, drop
each arriving packet with some drop probability
whenever the queue length exceeds some drop
level
20
RED Details
• Compute average queue length
AvgLen = (1 - Weight) * AvgLen +
Weight * SampleLen
0 < Weight < 1 (usually 0.002)
SampleLen is queue length each time a packet arrives
MaxThreshold
MinThreshold
AvgLen
• The weighted running average calculation tries to detect
long-lived congestion, by filtering out short-term changes
in the queue length
21
RED Details (cont)
• Weighted Running Average Queue Length
22
RED Details (cont)
• Two queue length thresholds
if AvgLen <= MinThreshold then
enqueue the packet
if MinThreshold < AvgLen < MaxThreshold then
calculate probability P
drop arriving packet with probability P
if ManThreshold <= AvgLen then
drop arriving packet
23
RED Details (cont)
• Computing probability P
TempP = MaxP * (AvgLen - MinThreshold)/
(MaxThreshold - MinThreshold)
P = TempP/(1 - count * TempP)
• Drop Probability Curve
P(drop)
1.0
MaxP
AvgLen
MinT hresh
MaxThresh
24
Tuning RED
• Probability of dropping a particular flow’s packet(s) is
roughly proportional to the share of the bandwidth that flow
is currently getting
• MaxP is typically set to 0.02, meaning that when the average
queue size is halfway between the two thresholds, the
gateway drops roughly one out of 50 packets.
• If traffic is bursty, then MinThreshold should be
sufficiently large to allow link utilization to be maintained at
an acceptably high level
• Difference between two thresholds should be larger than the
typical increase in the calculated average queue length in one
RTT; setting MaxThreshold to twice MinThreshold is
reasonable for traffic on today’s Internet
• Penalty Box for Offenders
25
6.4.3 Source-based Congestion Avoidance
-- TCP Vegas
• Idea: source watches for some sign that
router’s queue is building up and congestion
will happen too; e.g.,
– RTT grows
– Sending rate flattens
26
Algorithms By RTT Grows
• Algorithm 1: The congestion window normally increases as in
TCP, but every two round-trip delays, the algorithm checks to
see if the current RTT is greater than the average of the
minimum and maximum RTTs seen so far. If it is, then the
algorithm decreases the congestion window by one-eighth.
• Algorithm 2: The window is adjusted once every ywo roundtrip delays based on the product
w = (CurrentWindow –OldWindow)x(CurrentRTT – OldRTT)
If w > 0, the source decreases the window size by 1/8,
If w <=0, the source increases the window by one maximum
packet size.
27
Algorithms by sending rate flattens
• Algorithm 3: Every RTT, it increases the window size by one
packet and compares the throughput achieved to the
throughput when the window was one packet smaller.
• If the difference is less than ½ the throughput achieved when
only one packet was in transit, the algorithm decreases the
window by one packet.
• This scheme calculates the throughput by dividing the number
of bytes outstanding in he network by the RTT.
• Algorithm 4: TCP Vegas. The goal is to maintain the “right”
amount of extra data in the network. If a source is sending
two much extra data, it will cause long delays and possibly
lead to congestion. If a connection is sending too little extra
data, it cannot responds rapidly enough to transient increases
in the available network bandwidth.
28
TCP Vegas
70
60
50
40
30
20
10
0.5
1.0 1.5
2.0
2.5 3.0 3.5 4.0 4.5
T ime (seconds)
5.0
5.5 6.0
6.5
7.0 7.5 8.0 8.5
0.5 1.0 1.5
2.0
2.5 3.0 3.5 4.0 4.5
T ime (seconds)
5.0
5.5 6.0
6.5
7.0 7.5 8.0 8.5
0.5 1.0 1.5
2.0
2.5 3.0 3.5 4.0 4.5
T ime (seconds)
5.0
5.5 6.0
6.5
7.0 7.5 8.0 8.5
1100
900
700
500
300
100
10
5
• Congestion window vs observed throughput rate (the three
graphs are synchronized).
29
TCP Vegas Algorithm
• Let BaseRTT be the minimum of all measured RTTs
(commonly the RTT of the first packet)
• If not overflowing the connection, then
ExpectRate = CongestionWindow/BaseRTT
• Source calculates sending rate (ActualRate) once per RTT
• Source compares ActualRate with ExpectRate
Diff = ExpectedRate - ActualRate
if Diff < a (too little extra data)
increase CongestionWindow linearly in next RTT
else if Diff > b (too much extra data)
decrease CongestionWindow linearly in next RTT
else
leave CongestionWindow unchanged
30
TCP Vegas Algorithm (cont)
• Parameters
 a = 1 packet
 b = 3 packets
70
60
50
40
30
20
10
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
6.5
7.0
7.5
8.0
3.5 4.0 4.5 5.0
T ime (seconds)
5.5
6.0
6.5
7.0
7.5
8.0
T ime (seconds)
240
200
160
120
80
40
0.5
1.0
1.5
2.0
2.5
3.0
• The goal is to keep the ActualRate between these two
thresholds, that is, within the region
• TCP Vegas does use multiplicative decrease when a
timeout occurs; the linear decrease is an early decrease in
the congestion window that, hopefully, happens before
congestion occurs.
31