Chapter 6: Congestion Control and Resource Allocation

Chapter 6
Congestion Control and
Resource Allocation
• We have seen enough layers of protocol hierarchy to
understand how data can be transferred among
processes across heterogeneous networks
• Problem
– How to effectively and fairly allocate resources among a
collection of competing users?
• Resources
Congestion Control and Resource Allocation
– Bandwidth of the links
– Buffers at the routers and switches
• What is congestion?
– “too many sources sending too much data too fast for network
to handle”
– Higher rate of inputs to a router than outputs
– Packets compete at a router for the use of a link
– Each competing packet is placed in a queue waiting for its turn
to be transmitted over the link
• When too many packets are contending for the same
Congestion Control and Resource Allocation
– Queuing/Buffering at routers
• Long Delays
– The buffer overflows at routers
• Dropped packets
• Result in lost packets
• Network should provide a congestion control
mechanism to deal with such a situation
• Congestion control is a global issue
Congestion Control vs Flow Control
– involves every router and host within the subnet
• Flow control – scope is point-to-point
– involves just sender and receiver
– How did TCP implement this?
• Congestion control and Resource Allocation
Congestion Control and Resource Allocation
– Two sides of the same coin
• If the network takes active role in allocating resources
– The congestion may be avoided
– No need for congestion control
• Allocating resources with any precision is difficult
Congestion Control and Resource Allocation
– Resources are distributed throughout the network
• On the other hand, we can always let the sources send
as much data as they want
– Then recover from the congestion when it occurs
– Easier approach
– But it can be disruptive because many packets may be
discarded by the network before congestions can be
• Congestion control and resource allocations involve
both hosts and network elements such as routers
• What can a host do to control congestion?
• What can a router do to control congestion?
• Congestion control and resource allocations involve
both hosts and network elements such as routers
• In network elements (Routers)
– Various queuing disciplines can be used to control:
1. the order in which packets get transmitted and
2. which packets get dropped
• At the hosts’ end
– The congestion control mechanism paces how fast sources
are allowed to send packets
– Packet Switched Network (internet)
Issues in Resource Allocation
• Resource allocation consists of multiple links and
• Congested intermediate link.
• In such an environment, a given source may have
more than enough capacity on the immediate
outgoing link to send a packet.
• Bottleneck router: But somewhere in the middle
of a network, its packets encounter a link that is
being used by many different traffic sources
• What are the causes of a bottleneck router?
1. Amount of incoming data is too large AND
2. Speed of data arrival at router too high
- Why?
• Network Model
Issues in Resource Allocation
– Packet Switched Network
A potential bottleneck router.
• Connectionless Flows
Issues in Resource Allocation
• For much of our discussion, we assume that the
network is essentially connectionless
• Any connection-oriented service needed is
implemented in the transport protocol that is
running on the end hosts.
• Why this assumption?
• Why this assumption?
Issues in Resource Allocation
– Because, our original assumptions were too
1. Connection-less vs connection-oriented networks
– There IS a grey area in between
2. Datagrams are completely independent in connectionless
– Yes, datagrams ARE certainly switched independently
– But usually, stream of datagrams between a particular pair of hosts
flows through a particular set of routers
• Network Model
Issues in Resource Allocation
– Connectionless Flows
Multiple flows passing through a set of routers
Issues in Resource Allocation
• Multiple related packets flow through each router
• Routers maintain soft state (some state of
information) for each flow that goes through it
• Soft state is refreshed by periodic messages or is
otherwise expired.
• Soft State (e.g. PIM)Vs Hard State Information (e.g RIP)
• Soft state of a flow can be used to make resource
allocation decisions about the packets that belong to
the flow.
Issues in Resource Allocation
• Soft state represents a middle ground between :
– a purely connectionless network that maintains no state at the routers and
– a purely connection-oriented network that maintains hard state at the
• In general, the correct operation of the network does not
depend on soft state being present
– Each packet is still routed correctly without regard to this state
• But, when a packet happens to belong to a flow for which the
router is currently maintaining soft state, then the router is
better able to handle the packet.
• There are countless ways in which resource allocation
mechanisms differ
• We will discuss 3 dimensions along which resource
allocation mechanisms can be characterized:
1. Router-centric vs Host-centric
2. Reservation based vs Feedback based
3. Window based vs Rate based
Issues in Resource Allocation
• Router-centric vs. Host-centric
1. Router-centric design:
• Each router takes responsibility for :
– deciding when packets are forwarded and
– selecting which packets are to dropped, as well as
– informing the hosts that are generating the network traffic how
many packets hosts are allowed to send.
2. Host-centric design:
• End hosts:
– observe the network conditions
» e.g. observe how many packets they are successfully getting through
the network
– Adjust their behavior accordingly
– Note that these two groups are not mutually exclusive.
• Reservation-based vs. Feedback-based
Issues in Resource Allocation
1. Reservation-based system :
• some entity (e.g., the end host) asks the network for a certain amount of
capacity to be allocated for a flow.
• Each router then allocates enough resources (buffers and/or percentage
of the link’s bandwidth) to satisfy this request.
• If the request cannot be satisfied at some router, because doing so
would overcommit its resources, then the router rejects the reservation.
2. Feedback-based approach:
• the end hosts begin sending data without first reserving any capacity
• End hosts then adjust their sending rate according to the feedback they
• EXPLICIT feedback - Router sends message ’slow down’
• IMPLICIT feedback - end host adjusts sending rate subject to network
conditions e.g. packet loss
• Window-based versus Rate-based
Issues in Resource Allocation
– Window-based:
• Receiver advertises a window to the sender
(window advertisement)
– Rate-based
• Receiver control sender’s behavior using a rate:
– how many bit per second the receiver or network is able to
• E.g. multimedia streaming application
• Resource allocation schemes can be evaluated
based on:
Evaluating Resource Allocation Schemes
– Effectiveness
– Fairness
• Consider the two principal metrics of networking:
Evaluate Effectiveness of Resource Allocation
– throughput and
– delay.
• Clearly, we want as much throughput and as little delay
as possible.
• Unfortunately, these goals are often somewhat at odds
with each other.
Increase throughput:
Evaluate Effectiveness of Resource Allocation
– Idle link hurts throughput.
– So, allow as many packets into the network as possible,
– Goal: drive the utilization of all the links up to 100%.
Problem with this strategy:
– increasing the number of packets in the network also
increases the length of the queues at each router.
– Longer queues, in turn, mean packets are delayed longer in
the network
– Throughput and Delay Relationship:
Evaluate Effectiveness of Resource Allocation
• Described using the throughput to delay ratio.
– Used as a metric for evaluating the effectiveness of a resource
allocation scheme.
• This ratio is sometimes referred to as the
power of the network.
• Power = Throughput/Delay
Evaluate Effectiveness of Resource Allocation
Ratio of throughput to delay as a function of load
• What exactly constitutes fair resource allocation?
Fair Resource Allocation
• For example, a reservation-based resource allocation
scheme provides an explicit way to create controlled
• With such a scheme, we might use reservations to enable
a video stream to receive 1 Mbps across some link while
a file transfer receives only 10 Kbps over the same link.
• When several flows share a particular link, we would
like for each flow to receive an equal share of the
Fair Resource Allocation
• This definition presumes that a fair share of
bandwidth means an equal share of bandwidth.
• But equal shares may not equate to fair shares.
– Why?
• Should we also consider the length of the paths being
Fair Resource Allocation
– Consider the figure:
One four-hop flow competing with three one-hop flows
• What does “Fair” mean?
– Equal share of resources for all flows?
– Proportional to how much you pay for service?
– Should we take route length into account?
• Raj Jain’s fairness index:
• Metric to quantify the fairness of a congestion-control
• Assume: fair implies equal, all paths are of equal length
• Definition:
Fair Resource Allocation
– Given a set of flow throughputs (x1, x2, . . . , xn), the following function
assigns a fairness index to the flows:
– The fairness index always results in a number between 0 and 1, with 1
representing greatest fairness.
• Each router must implement some queuing discipline
Queuing Disciplines
– Scheduling policy
– Drop policy
• Queuing allocate bandwidth and buffer space:
– Bandwidth: which packet to serve (transmit) next
– Buffer space: which packet to drop next (when required)
• Queuing also affects latency
1. FIFO Queuing
2. Priority Queuing
3. Fair Queuing
Queuing Disciplines
• FIFO queuing:
FIFO Queuing
• Scheduling disciplines
• First-Come-First-Served (FCFS) queuing
• The first packet that arrives at a router transmitted first
• Tail drop:
• Drop policy
• If a packet arrives and the queue (buffer space) is full,
then the router discards that packet regardless of flow or
Queuing Disciplines
(a) FIFO queuing
(b) tail drop at a FIFO queue
• What are the problems of FIFO scheduling?
• How suitable is FIFO queuing for sending both
Voice over IP and Emails?
• FIFO: First-in first-out scheduling
– Simple, but restrictive
• Voice over IP and Email:
– Two kinds of traffic (different flows)
– Their transmission requirements differ:
• Voice over IP needs low delay
• E-mail can tolerate delay
• FIFO queue treats all packets the same:
– If voice traffic comes after email traffic, VOIP traffic waits
behind e-mail in FIFO queue (not acceptable!)
FIFO Queue
FIFO + Drop-tail Problems
• Lock-out problem
– Few flows can monopolize the queue space
• A flow sends more Fill most of FIFO queue
• Full queues
Occurs if routers’ queues are often full.
TCP detects congestion from loss
Forces network to have long standing queues in steady-state
Queuing delays – bad for time sensitive traffic
Priority Queuing
A simple variation on basic FIFO queuing.
Mark each packet with a priority
The routers implement multiple FIFO queues, one per priority class.
Always transmit high-priority traffic when present
Possible starvation
Priority 1 Queue
Priority 2 Queue
Priority 3 Queue
Fair Queuing
• Solves the main problem with FIFO queuing:
• FIFO queuing does not discriminate between different traffic
sources/ flows
• Fair queuing (FQ) algorithm address this problem.
• Maintain a separate queue for each flow currently being
handled by the router.
• The router then services these queues in a round-robin
Fair Queuing
Round-robin service of four flows at a router
• Does fair queuing actually do a fair queuing?
Chapter 6
• The main complication with Fair Queuing:
– Packets lengths vary: packets being processed at a router are
not necessarily the same length.
• To truly allocate the bandwidth of the outgoing link in a
fair manner, it is necessary to take packet length into
– E.g. A router is managing two flows
one with 1000-byte packets and the other with 500-byte packets
If you use round-robin for processing of packets:
the first flow 2/3 of the link’s bandwidth and
the second flow only 1/3 of its bandwidth.
• What we really want is bit-by-bit round-robin
Bit-by-bit round-robin
• Bit-by-bit round-robin:
– Router transmits: a bit from flow 1, then a bit from flow 2,
and so on.
– Clearly, it is not feasible to interleave the bits from
different packets.
– The FQ mechanism therefore simulates this behavior:
1. Determine finishing-time of transmitting a packet if it used bitby-bit round-robin
2. Use this finishing time to sequence the packets for transmission.
• Bit-by-bit round robin:
Queuing Disciplines
– To understand the algorithm for approximating bit-bybit round robin, consider the behavior of a single flow
– For this flow, let
• Pi : packet length: denote the length of packet i
• Si: start time: time when the router starts to transmit packet i
• Fi: finish time: time when router finishes transmitting packet I
• Fi = Si + Pi
– Assume Pi is expressed in time to transmit packet
– (1bit  1 second)
• When do we start transmitting packet i?
Queuing Disciplines
– Depends on whether packet i arrived before or after the
router finishes transmitting packet i-1 for the flow
• If packet i arrived before router finished transmitting
packet i-1:
– First bit of packet i transmitted immediately after last bit of
packet i-1
• If packet i arrived after router finished transmitting
packet i-1:
– i.e. router finished sending packet i-1 long before packet i
– i.e. for sometime the queue for this packet is empty
– Let Ai denote the time that packet i arrives at the
– Then Si = max(Fi-1, Ai)
– Fi = max(Fi-1, Ai) + Pi
Queuing Disciplines
• Now for every flow, we calculate Fi for each packet that arrives
using our formula
• We then treat all the Fi as timestamps
• Next packet to transmit is always the packet that has the lowest
– The packet that should finish transmission before all others
– i.e. a shorter packet arriving can be inserted in front of a longer packet
already in queue
Queuing Disciplines
• E.g. Fair Queuing
(a) packets with earlier finishing
times are sent first
Both packets in flow1 have earlier finishing
times than packet in flow2
(b) sending of a packet already
in progress is completed
Router already begun sending packet from flow2
when a packet from flow 1 comes
Implementation does not preempt already processing
packet (So, not exactly bit-by-bit fair queuing)
• TCP congestion control was introduced into the
Internet in the late 1980s by Van Jacobson, roughly
eight years after the TCP/IP protocol stack had become
TCP Congestion Control
• Immediately preceding this time, the Internet was
suffering from congestion collapse—
– hosts send their packets into the Internet as fast as the
advertised window would allow
– congestion would occur at some router (causing packets to
be dropped),
– hosts would time out and retransmit their packets, resulting
in even more congestion
TCP Congestion Control
• The idea of TCP congestion control:
– Each source to determine how much capacity is available in
the network, so that it knows how many packets it can safely
have in transit.
– Once a given source has this many packets in transit, it uses
the arrival of an ACK as a signal that one of its packets has left
the network, and that it is therefore safe to insert a new
packet into the network without adding to the level of
– By using ACKs to pace the transmission of packets, TCP is
said to be self-clocking.
• There a quite a few TCP congestion control variants in
use today:
TCP Congestion Control
1. Additive Increase/ Multiplicative Decrease (AIMD)
2. Slow Start
3. Fast retransmit and Fast Recovery
• Congestion Window:
Additive Increase Multiplicative Decrease (AIMD)
– TCP maintains a new variable called congestion window per
– Source use this to limit amount of allowed data in transit at a
given time.
• The congestion window is congestion control’s
counterpart to flow control’s advertised window.
• The maximum number of bytes of unacknowledged data
allowed = Minimum( congestion Window, advertised
Additive Increase Multiplicative Decrease
• TCP’s effective window is revised as follows:
– MaxWindow = MIN(CongestionWindow, AdvertisedWindow)
– EffectiveWindow = MaxWindow − (LastByteSent − LastByteAcked).
– Remember from previous: LastByteSent – LastByteAcked <= AdvertisedWindow
• That is, MaxWindow replaces AdvertisedWindow in the
calculation of EffectiveWindow.
• Thus, a TCP source is allowed to send no faster than the
slowest component—the network or the destination
host—can accommodate.
• How does TCP comes to learn an appropriate value for
– Sender needs to know both congestion window and
advertised window.
– AdvertisedWindow is sent by the receiver
– There is no one to send a suitable
CongestionWindow to sender!
Additive Increase Multiplicative Decrease
• Answer:
– TCP sender sets the CongestionWindow based on
the level of congestion it perceives to exist in the
– Sender decrease the congestion window when the
level of congestion goes up
– Sender increase the congestion window when the
level of congestion goes down.
Additive Increase Multiplicative Decrease
– Taken together, the mechanism is commonly called
additive increase/multiplicative decrease (AIMD)
• How does the source determine that the
network is congested and that it should
decrease the congestion window?
Additive Increase Multiplicative Decrease
• Answer: Multiplicative Decrease
Additive Increase Multiplicative Decrease
• The main reason for un-delivered packets, and timeout
occurrences, is that a packet was dropped due to congestion.
• Therefore, TCP interprets timeouts as a sign of congestion and
reduces it’s current transmission rate.
• Specifically, each time a timeout occurs, the source sets
CongestionWindow to half of its previous value.
• Repetition of this cycle gives the classic sawtooth pattern
Additive Increase Multiplicative Decrease
• Although CongestionWindow is defined in terms of
bytes, it is easiest to understand multiplicative
decrease if we think in terms of whole packets.
Additive Increase Multiplicative Decrease
suppose current CongestionWindow = 16 packets.
If a loss is detected, CongestionWindow is set to 8.
Additional losses cause CongestionWindow to be reduced to
4, then 2, and finally to 1 packet.
– CongestionWindow is not allowed to fall below the MSS: i.e.
size of a single packet.
Additive Increase Multiplicative Decrease
• Additive Increase:
– Every time the source successfully sends a
CongestionWindow’s worth of packets—that is, each
packet sent out during the last RTT has been
ACKed—it adds the equivalent of 1 packet to
– i.e. Increment CongestionWindow by one packet per
RTT (until a packet loss occur, in which case AIMD
enters MD mode)
Additive Increase Multiplicative Decrease
Packets in transit during additive increase, with one packet being added each RTT.
Additive Increase Multiplicative Decrease
• Each time an ACK arrives, the congestion window is
incremented as follows :
– Increment = MSS × (MSS/CongestionWindow)
– Congestion Window+= Increment
– CongestionWindow>=MSS
– i.e, rather than incrementing CongestionWindow by an entire
MSS bytes each RTT, we increment it by a fraction of MSS
every time an ACK is received.
• Why does TCP decrease congestion window
aggressively and increase it conservatively?
• One intuitive reason is that the consequences of having
too large a window are much worse than those of it
being too small.
• For example, when the window is too large, packets
that are dropped will be retransmitted, resulting in
• Thus, it is important to get out of this state quickly.
• Problem:
Chapter 6
Additive Increase Multiplicative Decrease
– AIMD is good for channels operating close to network capacity
– But AIMD takes a long time to ramp up to full capacity when it
has to start from scratch
– Why? AIMD increase congestion window linearly.
• Solution:
– Use Slow Start to increase window rapidly from a cold start
• Another congestion control mechanism
Slow Start
• Increase the congestion window rapidly from a
cold start.
• Effectively increases the congestion window
exponentially, rather than linearly.
• Start:
Slow Start
– Set Congestion window = 1 packet
– Send 1 packet
• Upon receipt of ACK:
– Set Congestion window = 1+1 = 2 packets
– Send 2 packets
• Upon receipt of 2 ACKs:
– Set Congestion window = 2+2 = 4 packets
– Send 4 packets
• The end result is that TCP effectively doubles the number
of packets it has in transit every RTT.
Slow start
exponential increase in sender window
Additive increase
linear increase in sender window
Slow Start
• There are two situations in which slow start
1. At the very beginning of a connection (Cold Start)
2. When the connection goes dead while waiting for a
timeout to occur.
1. At the very beginning of a connection (Cold
Slow Start
• Source doesn’t know how many packets it is going to have in
transit at a given time.
– TCP runs over everything from 9600bps link to 2.4Gbps link
• Slow start continues to double CongestionWindow each RTT
until there is a loss
• When a loss is detected, timeout causes multiplicative decrease
to divide CongestionWindow by 2.
Slow Start
2.When the connection goes dead while waiting
for a timeout to occur.
• Original TCP Sliding Window:
– Sender sent advertised window of data already
– a packet gets lost
– Source waits for ACK and eventually, a timeouts and retransmit
lost packet.
– Source gets a single cumulative ACK that reopens an entire
advertised window
• Slow Start:
– The source then uses slow start to restart the flow of data rather
than dumping a whole window’s worth of data on the network all
at once.
Slow Start
• Target congestion window:
– After detecting packet loss:
• Update congestion window =
CongestionWindow prior to the last packet loss/2
• This is called target congestion widow compared to current
congestion window (actual congestion window)
– Now we need to remember both:
Slow Start
• “target” congestion window:
– resulting from multiplicative decrease
– Called Congestion Threshold (or simply threshold)
• “actual” congestion window:
– used by slow start.
– Simply called Congestion Window (CW).
– How to update current/actual CW?
• Upon packet loss calculate threshold= CW/2
• Reset actual CW = 1
• Use Slow start to rapidly increase the sending rate up to
target CW, and then use additive increase beyond this
– Target congestion window is where Slow start ends
and AIMD begins.
Slow Start
Behavior of TCP congestion control.
Colored line = value of CongestionWindow over time;
solid bullets at top of graph = timeouts;
hash marks at top of graph = time when each packet is transmitted;
vertical bars = time when a packet that was eventually retransmitted was first
Additive increase
Slow Start
Slow starts
Behavior of TCP congestion control.
Colored line = value of CongestionWindow over time;
solid bullets at top of graph = timeouts;
hash marks at top of graph = time when each packet is transmitted;
vertical bars = time when a packet that was eventually retransmitted was first
Slow Start
Initial slow start
• Initial slow start:
– Rapid increase due to exponential increase in sender window
Slow Start
Packets lost
• Why?
– TCP attempts to learn network bandwidth
– It uses exponential growth of congestion window to do this.
– Source runs the risk of having ½ window worth packets dropped by network.
• E.g. network capacity = 16 packets
• Source sends 1, 2,4,8,16 successfully so then bumps up congestion window to 32
• 32-16 = 16 packets are dropped (worst case – some packets might be buffered at some router)
Slow Start
Congestion window flattens
• Why?
– No new packets are sent (notice no hash marks)
– Because several packets were lost.
– So no ACKs arrive at sender
Slow Start
Timeout occurs
~34/2  17
• Timeout occurs.
Congestion window is divided by 2 : ~34KB  ~17KB
Set CongestionThreshold = ~17KB
Reset CongestionWindow=1KB
Use exponential increase in CongestionWindow to arrive at
– Use Additive increase afterwards (no apparent in figure)
Packets lost
Timeout occurs
window reset
to 1 packet
Slow start runs
Slow Start
increase runs
– Congestion window is reset to 1 packet
– It starts ramping up from there using slow start.
– Then use Additive increase afterwards
• Coarse-grained implementation of TCP timeouts led to long
periods of time during which the connection went dead
while waiting for a timer to expire.
Fast Retransmit and Fast Recovery
• To solve this, a new mechanism called fast retransmit was
added to TCP.
• Fast retransmit is a heuristic that sometimes triggers the
retransmission of a dropped packet sooner than the regular
timeout mechanism.
• Fast Retransmit
Fast Retransmit and Fast Recovery
– Receiver sends a duplicate ACK:
• Receiver gets data
• Receiver responds with an acknowledgment, even if this
sequence number has already been acknowledged
(duplicate ACK).
• Out of order packet receipt:
–TCP cannot yet acknowledge the data the
packet contains because earlier data has not
yet arrived
–TCP resends the same acknowledgment
(duplicate ACK) it sent the last time.
–Sender receives a duplicate ACK
–So, sender interprets this as receiver got a
packet out of order
• What are the reasons for receiving packets outof-order?
• Answer:
Fast Retransmit and Fast Recovery
– Earlier packet might have been lost or delayed
• Question:
– How does the sender make sure it is actually a lost
packet, not a delayed packet?
– Sender waits until it sees some number of duplicate
ACKs (to make sure it is not a delayed packet but a
lost packet)
Fast Retransmit and Fast Recovery
– Why does receiver send multiple duplicate ACKs?
– For every single packet sent after the missing packet,
receiver will send a duplicate ACK
Fast Retransmit and Fast Recovery
– Sender then retransmits the missing packet.
– In practice, TCP waits until it has seen three
duplicate ACKs before retransmitting the packet.
• Figure illustrates how duplicate ACKs lead to a fast
the destination receives
packets 1 and 2, but
packet 3 is lost in the
Sender sees the 3rd
duplicate ACK for
packet2, and
retransmit packet 3
Sender sends a
cumulative ACK for
everything up to and
including packet 6
Fast Retransmit and Fast Recovery
destination will
send a duplicate
ACK for packet 2
when packet 4
arrives, again when
packet 5 arrives,
and so on
To simplify this
example, we think in
terms of packets 1, 2,
3, and so on, rather
than worrying about
the sequence numbers
for each byte.
• Fast Retransmit and Fast Recovery
Fast Retransmit and Fast Recovery
Trace of TCP with fast retransmit. Colored line = CongestionWindow;
solid bullet = timeout; hash marks = time when each packet is transmitted; vertical
bars = time when a packet that was eventually retransmitted was first transmitted.
Fast Retransmit
Fast Retransmit and Fast Recovery
No fast retransmit
• This improves throughput by ~20%
Fast Retransmit and Fast Recovery
• However this does not completely eliminate all
coarse-grained timeouts.
– Why?
Fast Retransmit and Fast Recovery
• Fast retransmit strategy does not eliminate all
coarse-grained timeouts.
• This is because for a small window size there will
not be enough packets in transit to cause
enough duplicate ACKs to be delivered.
• Fast Recovery
Fast Retransmit and Fast Recovery
– Fast Retransmission signals congestion.
• due to lost packets
– With fast recovery:
• The sender, instead of returning to Slow Start uses a pure
– Why?
» Slow start reset CW =1 and rapidly increase to threshold.
• i.e. sender simply reduces the congestion window by half
and resumes additive increase.
• Thus, recovery is faster -- this is called Fast Recovery.
• The version of TCP wherein fast retransmit
and fast recovery are added in addition to
previous congestion control mechanisms is
called TCP Reno.
TCP Reno
– Has other features:
• header compression (if ACKs are being received
regularly,omit some fields of TCP header).
• delayed ACKs -- ACK only every other segment.
• Where are we ?
• We are done with Section 6.3.
• We now move on to looking at more
sophisticated congestion avoidance
• TCP does congestion control:
Why Congestion Avoidance?
– Reactive approach:
• Let congestion occur, then control
– TCP need to create losses to determine
available bandwidth:
• TCP increases load to determine when congestion occurs
and then backs off.
• Packet losses used to determine congestion.
– This is costly : Causes packet losses
• Can we do better?
Congestion Avoidance Mechanism
– Avoid congestion ?
– Prevent losses?
– Need a proactive approach!
• Can we predict the onset of congestion ?
– If so, we can reduce the sending rate just before
packets dropped.
• We introduce 3 congestion avoidance mechanisms:
Congestion Avoidance Mechanism
1. DECbit
2. Random Early Detection (RED)
3. Source based Congestion Avoidance
• Router Based Congestion Avoidance:
DECbit , RED
Put additional functionality into router to assist end nodes
determine congestion
• Host Based Congestion Avoidance:
Source Based Congestion Avoidance
Avoid congestion purely from end nodes.
• Evenly splits the responsibility of congestion control
between end hosts and routers.
• At Router:
– Router monitors congestion
– It explicitly notifies end-host when congestion is about to occur
– How to notify?
• Each packet has a “Congestion Notification” bit called the
DECbit in its header.
• If router is congested, it set the DECbit of packets that flow
through it.
• At Destination end-host:
– The notification reaches the destination
– Destination copies the bit in the ACK that it sends the source.
• At Source end-host:
– In response, the source adjusts transmit rate.
• How does a router monitor congestion?
– When does router sets DECbit of packets flowing through it?
• Router sets DECbit of packets if its average queue
length is > = 1 at the time the packet arrives
• Therefore, Criterion for congestion:
– average queue length of router  1 packet
• How to measure average queue length?
– Queue length is averaged over a time interval that spans the
last busy+idle cycle, plus the current busy cycle.
– Busy: router send data
– Idle: router not send data
last busy+idle cycle
Current busy cycle
• Source adjusts rate to avoid congestion.
– Counts fraction of DECbits set in each congestion window.
– If <50% set, increase rate additively (increase by 1).
– If >=50% set, decrease rate multiplicatively (decrease by 0.875th).
• RED is based on DECbit:
Random Early Detection (RED)
– Each router monitors its queue length
– When a router detects that congestion is about to happen, it
notify the source to adjust its congestion window.
• RED implicitly notifies sender by dropping packets
– Source notification: timeout/duplicateACK
– So, it works well with TCP
• Early Detection:
Random Early Detection (RED)
– Router drops few packets before queue is full.
– Notify source to slow down sooner than it would normally
– So , router does not have to drop lots of packets later on.
• When to drop a packet?
• What packets to drop?
Random Early Detection (RED)
• Answer: Early Random Drop:
• Drop each arriving packet with some drop probability
whenever the queue length exceeds some drop level.
• Drop probability is increased as the average queue
length increases.
• Computing average queue length:
Random Early Detection (RED)
– Using a weighted running average:
• AvgLen = (1 − ) × AvgLen + × SampleLen
•  : a Weight between 0 and 1
• SampleLen: queue length when a sample measurement is made.
Chapter 6
• Average queue length captures notion of congestion (long-lived
congestion) better than an instantaneous measure.
Bursty traffic (queues become full fast/ get empty fast. )
Using this value for query length
In deciding congestion is bad!
(it is only a small burst of traffic.
Queue is not so full other times.)
Short term changes are filtered out
• RED has two queue length thresholds that trigger certain activity:
Random Early Detection (RED)
– MinThreshold
– MaxThreshold
• When a packet arrives at the router, RED compares the current
AvgLen with these two thresholds, according to the following
• if AvgLen  MinThreshold
 queue the packet
• if MinThreshold < AvgLen < MaxThreshold
 calculate probability P
 drop the arriving packet with probability P (random drop)
• if MaxThreshold  AvgLen
 drop the arriving packet
• Drop probability P :
Random Early Detection (RED)
– A function of both AvgLen and how long it has been since the last packet was
• P is computed as follows:
TempP = MaxP × (AvgLen − MinThreshold)/(MaxThreshold − MinThreshold)
P = TempP/(1 − count × TempP)
Denote time
elapsed since last
packet dropped
• Note: Above calculation assumes queue size is measured in packets.
Chapter 6
Random Early Detection (RED)
if MinThreshold < AvgLen < MaxThreshold:
TempP = MaxP × (AvgLen − MinThreshold)/(MaxThreshold − MinThreshold)
P = TempP/(1 − count × TempP)
Extra step : P helps space out drops
Number of queued packets since last drop.
count  P
RED thresholds on a FIFO queue
Maximum drop probability
Drop probability function for RED
• Why spacing out packet drops necessary?
Chapter 6
• If you take TempP as dropping
probability, packet drops were
not well distributed in time
(occur in clusters)
• Why dropping clusters of
packets bad?
– Packet arrivals from a certain
connection are likely to arrive in
bursts (clusters)
– Clustering of drops can cause
multiple drops in a single
Packet burst in a connection can cause
AvgLen of queue to be in this range
Most packets dropped will belong to
Properties of RED
• Drops packets before queue is full
– In the hope of reducing the rates of some flows
• Drops packet in proportion to each flow’s rate
– High-rate flows have more packets
– Hence, a higher chance of being selected for dropping
• Drops are spaced out in time
– Which should help desynchronize the TCP senders
• Tolerant of burstiness in the traffic
– By basing the decisions on average queue length
Problems With RED
• Hard to get tunable parameters just right
– How early to start dropping packets?
– What slope for increase in drop probability?
– What time scale for averaging queue length?
• RED has mixed adoption in practice
– If parameters aren’t set right, RED doesn’t help
– Hard to know how to set the parameters
Source-based Congestion Avoidance
• Host watch for some sign from the network that:
– some router’s queue is building up and
– congestion will happen soon if nothing is done about it.
• How does an end host know router’s queue is building
– Host notice :
• a measurable increase in the RTT for each successive packet
it sends.
• Sending rate flattens
• Example Algoritm1:
Source-based Congestion Avoidance
Increase congestion window normally (like TCP)
For every two RTT delays:
If current RTT> Average( min RTT seen so far, max RTT seen so far) :
decreases the congestion window by 1/8
Chapter 6
Source-based Congestion Avoidance
• Example Algoritm2 (similar):
– Updating current window size is based on:
• Changes to RTT AND
• Changes to the window size.
For every two RTT delays:
Calculate (CurrentWindow − OldWindow)×(CurrentRTT − OldRTT)
If the result is positive:
the source decreases the window size by 1/8
If the result is negative or 0:
the source increases the window by one maximum packet size.
• What is QoS?
– Providing guarantees/bounds on various
network properties:
Quality of Service (QoS)
• Available bandwidth for flows
• Delay bounds
• Jitter (variation in delay)
• Packet loss
• Internet currently provides one single class of
“best-effort” service
Quality of Service (QoS)
– No assurances about delivery
• Most existing applications :
– E.g. mutimedia applications
– Tolerate delays and losses
– Can adapt to congestion
– Can use retransmissions to make sure data arrive correctly.
• Some “real-time” applications:
Quality of Service (QoS)
– E.g. teleconferencing
– Need assurance from network data arrive on time
– Using retransmissions are not OK: adds latency
– Both end hosts and routers are responsible for timeliness of
data delivery
– Therefore, best-effort delivery is not sufficient for real-time
• What we need is a new service model:
Quality of Service (QoS)
– Applications can request higher assurance from network
– Network treat some packets differently from others
– A network that can provide these different levels of service
is often said to support quality of service (QoS).
• Question:
Quality of Service (QoS)
– Doesn’t the Internet already support real-time applications?
– So why a new service model?
• Answer:
E.g. Skype (Voice/video over Internet)
Seem to work OK
Why? Best-effort service is often quite good.
However, if you want a reliable service for real-time
applications, best-effort isn’t good enough.
• What are the different needs of applications?
• Applications are of two types:
Application Requirements
– Non real-time/ Elastic Applications
– Real-time/ nonElastic Applications
• Non real-time/ Elastic Applications:
Application Requirements
– Traditional data applications
– E.g. telnet, FTP, email, web browsing etc.
– Can work without guarantees of timely delivery of data.
– Can gracefully tolerate delay and losses:
• They do not become unusable as delay increases
• Nice if data arrives on time, if not, it is still usable.
– Delay requirements vary by application:
• telnet (very interactive : low delay required)
• FTP (interactive bulk transfer)
• Email (least interactive: delay tolerable)
E.g. Audio application
• Real-time Applications:
– If a data packet is delayed, it is unusable.
– E.g. Audio application:
Data: audio samples
Digital samples are placed in packets
and transferred over the network
Audio samples are played back
at some appropriate rate
Digital samples are received at other end
and converted to analog signals
E.g. Audio application
• Audio Application:
– Each audio sample has a particular playback time:
– Played back rate = voice samples collection rate
– E.g.
• Sample collection rate at receiver= 1 sample per 125 s
• Playback time of a sample = 125 s later than the preceding sample
– What happens if data arrives after appropriate playback time
(delayed/retransmitted) ?
• Packet is useless
• How to make sure packets arrive in time?
E.g. Audio application
– Obviously, different packets in an audio stream may
experience different delays
• Why?
– Packets queued in switches and routers
– Queue lengths vary over time
– So, they may not arrive at expected time.
• Solution: Introduce a playback buffer
• Playback buffer:
E.g. Audio application
– Receiver buffer up some incoming data in reserve
– Store of packets waiting to be played at right time
– Playback point:
• Add a constant delay offset to playback time
– Short packet delay: OK
• Packet goes on buffer until playback time arrives
– Long packet delay: OK
• Packet goes in buffer (assuming a non empty queue). Played soon.
– Extreme Long packet delay: NOT OK
• Packet arrives after playback time
• Operation of a playback buffer
E.g. Audio application
Playback point
Already played at client/receiver
Playback buffer
From: YouTube
• Operation of a playback buffer
Receiving time
Playback time
Sending time
• How far can we delay playback of data?
E.g. Audio application
– For audio applications it is 300 ms.
– i.e. maximum time delay between when you speak and a
listener hears to carry on a conversation.
– Application want the network to guarantee that all its data
arrives within 300ms.
97% of packets have a
Latency of 100ms or less
If audio application set
playback point 100ms or less,
3 out of every 100 packets will arrive too late
E.g. Audio application
Long tail. We need to set playback point over 200ms to
ensure that all packets arrived in time
Delay measured over certain paths of the Internet
• Let us look at different classes of applications that serve to
motivate our service model
Taxonomy of Real-Time Applications
Taxonomy of Real-Time Applications
• Classification1: Based on Tolerance of occasional loss of data
1. Tolerant Applications: (E.g. Audio Applications)
• A packet loss:
– normal packet loss in network AND
– Packet arriving too late to be played back
• Small fix (occasional loss):
– Interpolate the lost sample by surrounding samples (little effect on
audio quality)
• If more and more samples lost:
– Voice quality declines  speech is incomprehensible
2. Intolerant Applications: (E.g. Robot Control Program)
Command sent to robot arm to reach it before it crashes
into something
Losing a packet is unacceptable
Taxonomy of Real-Time Applications
Note that many real time applications are more
tolerant of occasional loss than many non-real-time
E.g. Audio VS FTP (loss of 1 bit file completely useless)
• Classification2: Based on adaptability to delay
Taxonomy of Real-Time Applications
1. Delay-adaptive Applications: (E.g. Audio Applications)
• Application can adapt to amount of delay packets experience in
traversing the network
• E.g.
– Application notices that packets are almost always arriving within
300ms of being sent
– It therefore set playback point accordingly, buffering any packets
that arrive in less than 300 ms.
– Suppose that application subsequently observe that all packets
are arriving within 100 ms of being sent.
Taxonomy of Real-Time Applications
– If application adjust playback point to 100 ms:
» users of the application would probably perceive an
» require us to play samples at an increased rate for some
– We should advance playback point only when:
» We have a perceptible advantage AND
» We have some evidence that no. of lost packets are acceptably
2. Rate Adaptive Applications:
Taxonomy of Real-Time Applications
– Another class of adaptive applications
– E.g. many video coding algorithms can trade off bit rate
versus quality.
• A video is a finite sequence of correlated images.
• A higher bit rate accommodate higher image quality in the video
• Video coding = image compression + temporal component.
– If the network can support a certain bandwidth, we can set
our coding parameters accordingly.
– If more bandwidth becomes available later, we can change
parameters to increase the quality.
Chapter 6
• We need a richer service model than best-effort that meets the
needs of any application
• This leads us to a service model with several classes, each
available to meet the needs of some set of applications.
• Approaches have been developed to provide a range of qualities
of service
• Two categories:
1. Fine-grained approaches
2. Coarse-grained approaches
1. Fine-grained approaches:
Approaches to QoS Support
• Provide QoS to individual applications or flows
• Integrated Services:
– Often associated with the Resource Reservation
Protocol (RSVP)
2. Coarse-grained approaches:
• Provide QoS to large classes of data or aggregated traffic
• Differentiated Services:
– Probably the most widely deployed QoS mechanism
Chapter 6
Integrated Services (RSVP)
– A body of work IETF produced around 1995-1997
– Defines:
• A number of service classes that meet needs of some
application types.
• How RSVP could be used to make reservations using
service classes.
• Service Classes
Integrated Services (RSVP)
1. Guaranteed Service:
• Designed for intolerant applications.
• Applications require packets never arrive late.
• The network should guarantee that the maximum delay that any packet will
experience has some specified value
2. Controlled Load Service:
• Designed for tolerant, adaptive applications
• Applications run quite well on lightly loaded networks
• Service should emulate a lightly loaded network for those applications that
request the service, even though the network may in fact be heavily
Integrated Services (RSVP)
• Overview of Mechanisms (Steps):
1. Real-time application provide Flowspecs to network:
• Flowspecs specify the type of service required
2. Network perform admission control:
• Network decide if it can provide that service.
• Admission Control: process of deciding when to say no.
3. Network users/components perform resource reservation:
Users and components of the network exchange information:
– requests for service, flowspecs, admission control decisions.
Done using RSVP
4. Switches/Router perform packet scheduling:
• Network switches and routers meet the requirements of the flows by
managing the way packets are queued and scheduled for transmission.
Chapter 6
1. Real-time application provide flowspecs to network
Flowspec has 2 parts:
• TSpec: The part that describes the flow’s traffic
– E.g. bandwidth used by the flow
» not always a single number per application
» A video application generate more bits per second when the scene is
changing rapidly than when it is still
» Sending average bandwidth is insufficient
• Rspec: The part that describes the service requested from
the network.
– E.g. target delay bound (guaranteed service), No additional
parameter (controlled load service)
2. Network perform admission control:
Integrated Services (RSVP)
– Admission control looks at the TSpec and RSpec of the flow.
– Then it tries to decide if the desired service can be provided to
that amount of traffic requested, given the currently available
resources, without causing any previously admitted flow to
receive worse service than it had requested.
– If it can provide the service, the flow is admitted.
– if not, then it is denied.
3. Network users/components perform resource
reservation (RSVP):
Chapter 6
– Connectionless networks like the Internet have had no setup
– However real-time applications need to provide a lot more
setup information to network.
– Resource Reservation Protocol (RSVP) is one of the most
popular setup protocols.
Integrated Services (RSVP)
Connectionless nature of Internet is robust.
Because connectionless networks rely on little or no
state being stored in the network
Routers can crash/reboot and links can go up/down
while end-to-end connectivity is still maintained.
• RSVP tries to maintain this robustness by using the idea of
soft state in the routers.
• RSVP also support multicast flows just as effectively as
unicast flows.
• E.g. Unicast. One sender and one receiver trying to get a
reservation for traffic flowing between them
Integrated Services (RSVP)
Making reservations on a multicast tree
Chapter 6
Integrated Services (RSVP)
Sender sends a path message
TSpec to receiver
This informs receiver the type
of traffic and path used by
This help receiver making
appropriate resource
reservation at each router on
the path
Each router on path looks at PATH
message and figures out the
reverse path that will be used to
send reservations from the
receiver back to the sender
Sender Tspec
Making reservations on a multicast tree
Integrated Services (RSVP)
receiver sends a reservation back “up”
the multicast tree in a RESV message
Each router on the path looks at the
reservation request and tries to allocate
the necessary resources to satisfy it.
If the reservation can be made, the RESV
request is passed on to the next router.
If not, an error message is returned to
the receiver who made the request.
Sender Tspec
Receiver RSpec
If all goes well, the correct reservation is installed at every router between the sender
and the receiver.
As long as the receiver wants to retain the reservation, it sends the same RESV message
about once every 30 seconds
4. Switches/Router perform packet scheduling:
Chapter 6
– Finally, routers deliver the requested service to the data
– There are two things that need to be done:
1. Classify packets:
– Associate each packet with the appropriate
reservation so that it can be handled correctly.
2. Schedule Packets:
– Manage the packets in the queues so that they
receive the service that has been requested
Chapter 6
of service
• Two categories:
1. Fine-grained approaches
Integrated Approaches(RSVP)
2. Coarse-grained approaches
Differentiated services
• Integrated Services architecture:
Differentiated Services (DiffServ)
– allocates resources to individual flows
• Differentiated Services model:
– Allocates resources to a small number of classes of traffic.
• E.g. Add one new class, “premium.”
Differentiated Services (DiffServ)
• How to identify packets as premium or not?
– Use a bit in the packet header:
• bit=1  premium packet
• bit=0  best effort packet
• Who sets the premium bit, and under what
Chapter 6
Differentiated Services (DiffServ)
• Answer:
Differentiated Services (DiffServ)
– Many possible answers
– A common approach is to set the bit at an administrative
– E.g. the router at the edge of an ISP’s network might set the bit
for packets arriving on an interface that connects to a
particular company’s network.
– Why?
• The Internet service provider might do this because that company has
paid for a higher level of service than best effort.
Chapter 6
packet with the bit set?
Chapter 6
• Answer:
– IETF standardized the behavior of routers.
• Called “per-hop behaviors” (PHBs)
– The Expedited Forwarding (EF) PHB:
• One of the simplest PHBs
• If a packets is marked for EF treatment, it should be
forwarded by the router with minimal delay and loss.
• How to guarantee this?
– Router makes sure arrival rate of EF packets at the router < rate at which
the router can forward EF packets.
• True or False?
• Suppose host A is sending a large file to host B
over a TCP connection. The number of
unacknowledged bytes that A sends cannot
exceed the size of the advertised receiver buffer.
Chapter 6
• TCP is not permitted to overflow the allocated
receiver buffer.
• Hence when the sender can not send any more
data ReceiverWindow would be 0
• So, all the buffer would have unacknowledged
Chapter 6
• TCP waits until it has received three duplicate
ACKs before performing a fast retransmit.
Chapter 6
• Why do you think the TCP designers chose not
to perform a fast retransmit after the first
duplicate ACK for a segment is received?
• Packets can arrive out of order from the IP layer.
• So whenever an out of order packet would be
received it would generate a duplicate ACK
Chapter 6
– if we perform retransmission after the first duplicate
ACK it would lead the sender to introduce too many
redundant packets in the network.
Chapter 6
Quiz This Friday (10/24)!