Flow Control

Transcript Flow Control

Flow Control and Reliability
Control in WebTP
Ye Xia
10/24/00
Outline
• Motivating problems
• Recall known ideas and go through simple
facts about flow control
• Flow control examples: TCP (BSD) and
Credit-based flow control for ATM
• WebTP challenges and tentative solutions
Motivating Problem
• Suppose a packet is lost at F1’s receive buffer.
Should the pipe’s congestion window be reduced?
Answer
• Typically, No.
– Imagine flow F1 fails.
• OK if loss at the receive buffer is rare.
• Essentially, need flow/congestion control at flow
level. (Note: pipe’s congestion control is not endto-end.)
• To address this problem, we need to design and
feedback control scheme and a receiver buffer
management scheme. Reliability control also
complicates our design.
Possible Partial Solutions
• Solution 1
– Distinguish losses at receive buffer from network
losses.
– Cut the pipe’s congestion window only for network
losses.
– Otherwise, slow-down only the corresponding flow.
• Solution 2
– Make sure losses at receiver buffer never happen or
rarely happen.
Two Styles of Flow Control
• Link provides binary information: congestion/not
congested, or loss/no loss. Source decreases or
increases its traffic intensity incrementally. Call it
TCPC.
– E.g., TCP congestion control
• Link provides complete buffer information.
Source finds the right operating point. Call it
CRDT.
– TCP’s window flow control
– Various credit-based flow control schemes
Comparison: TCPC and CRDT
• TCPC
– Lossy
– Traffic intensity varies slowly and oscillates.
• CRDT
– No loss
– Handles bursty traffic well; handles bursty
receiver link well
Simple Scenario
• If C_s(t) <= C_r(t), for all t, flow control is
not needed. (B_r = 0)
• Otherwise, buffer may absorb temporary
traffic overload. More likely, feedbackbased flow control is necessary.
Flow Control Definition and
Classification
• Link or links directly or indirectly generate feedback.
• (Virtual) source calculates or adjusts data forwarding speed to avoid overloading
links.
• Known examples in ATM: explicit rate, binary rate, credit-based control. What is
TCP?
• New inventions: hop-by-hop explicit rate, end-to-end precise window allocation
Source
Calculate rate/window
Calculate credit/window
Calculation Style
Try and See (or iterated
algorithm), e.g., linear
increase/multiplicative decrease
Precise calculation,
requires allocation phase
or out-of-band control
Feedback
Exact feedback, e.g. credit or rate
Binary feedback, e.g, mark
or loss ratio
Control Loop
Hop-by-hop
Multi-hop
Flow Control Goals (Kung and Morris 95)
•
•
•
•
•
•
•
Low loss ratio
High link utilization
Fair
Robust – e.g. against loss of control information
Simple control and parameter tuning
Reasonable cost
Perhaps we should stress
– Small buffer sizes
What is TCP?
• Control is divided into two parts.
• (Network) congestion control: (congestion)-window-based
algorithm
– Linear/sublinear increase and multiplicative decrease of window.
– Congestion window can be thought as both rate (Win/RTT) and
credit.
– Use binary loss information.
– Multi-hop
• (End-to-end) flow control resembles credit scheme, with
credit update protocol.
• Can the end-to-end flow control be treated the same as
congestion control? Maybe, but …
TCP Receiver Flow Control
• Multiple TCP connections share the same
physical buffer: need buffer management so
that
– One connection does not take all buffers,
effectively shutting other connections.
– Deadlock may be prevented.
• Packet re-assembly
TCP Receiver Buffer Management
• Time-varying physical buffer size B_r(t),
shared by n TCP connections.
• BSD implementation: receiver of connection i
can buffer no more than B_i amount of data at
any time. Source i tries not to overflow a buffer
of size B_i.
• If TCPC-styled control is used, it is hard to
guarantee not exceeding the buffer size B_i.
• Buffers are not reserved. It is possible  B_i >
B_r(t), for some time t.
Possible Deadlock
Connection 1:
… 2
3
4
5
6
Connection 2:
… 4
5
6
7
8
• Example: Two connections, each with B_i = 4.
• Suppose B_r = 4. At this point, physical buffer runs out,
reassembly cannot continue.
• Deadlock can be avoided if we allow dropping received packets.
Implications to reliability control (e.g. connection 1):
– OK with TCP, because packets 4 and 5 have not been acked
– WebTP may have already acked 4 and 5
Deadlock Prevention
• Simple solution: completely partition the buffer, i.e.  B_i <=
B_r. Inefficient. Also, what about if B_r varies?
• When only one packet worth of free buffer is left, drop any
incoming packet unless it fills the first gap.
• More buffer will be freed later when the initial segment of data is
consumed by application.
• In the following, the next incoming packet accepted is either 3
for conn. 1, or 5 for conn. 2
• Performance unclear
Connection 1:
… 2 3 4 5 6
Connection 2:
… 4
5
6
7
8
Simple Scenario
• If C_s(t) <= C_r(t), for all t, flow control is
not needed. (B_r = 0)
• Otherwise, buffer may absorb temporary
traffic overload. More likely, feedbackbased flow control is necessary.
Why do we need receiver buffer?
• Part of flow/congestion control when C_s(t) > C_r(t)
– In TCPC, certain amount of buffer is needed to get
reasonable throughput. (For optimality issues, see [Mitra92]
and [Fendick92])
– In CRDT, also for good throughput.
– Question: in CRDT, the complete buffer info. is passed to
the source. Why don’t we pass the complete rate info. so
that buffering is not needed. This is the idea of explicit rate
control. But in non-differentiable system, rate is not welldefined. Somehow, the time-interval for defining rate needs
to be adaptive according to the control objective. CRDT
does that and therefore handles bursty traffic well.
• Buffering is beneficial for data re-assembly.
Buffering for Flow Control:
Example
• Suppose link capacities are constant. Suppose C_s
>= C_r. To reach throughput C_r, B_r should be
– C_r * RTT, in a naïve but robust CRDT scheme
– (C_s - C_r) * C_r * RTT / C_s, if C_r is known to the
sender.
– 0, if C_r is known to the sender and sender never sends
burst at rate greater than C_r.
– Note: upstream can estimate C_r
Re-assembly Buffer Sizing
• Without it, throughput can suffer. (by how much?)
• Example: Send n packets in block, iid delays. If B = 1,
roughly ½ + e packets will be received on average. If B =
n, all n packets will be received.
• Buffer size depends on network delay, loss, packet
reordering behaviors. Can we quantify this?
Question: How do we put the two together? Re-assembly
buffer size can simply be a threshold number, e.g. TCP.
Example: (Actual) buffer size B = 6. But we allow packet 3
and 12 coexist in the buffer.
Credit-Based Control (Kung and Morris 95)
• Hop-by-hop, exact feedback, precise calculation at the
source.
• Overview of steps
– Before forwarding packets, sender needs to receive credits from
receiver
– At various times, receiver sends credits to sender, indicating
available receive buffer size
– Sender decrements its credit balance after forwarding a packet.
• Typically ensures no buffer overflow
• Works well over a wide range of network conditions, e.g.
bursty traffic.
Credit Update Protocol (Kung and Morris 95)
Credit-Based Control: Buffer Size
• Crd_Bal = Buf_Alloc – (Tx_Cnt – Fwd_Cnt)
• Update credit once every N2 packets
• Credit computation is for worst case but tight. Consider receiver
does not forward any data on interval [T_1, T_4]. No data
will be lost.
• For N connections, maximal bandwidth:
BW = Buf_Alloc / (RTT + N2 * N)
• Total buffer size: N * C_r * (RTT + N2 * N)
T_1
Receiver
sends
Fwd_Cnt
T_2
Sender
computes
Crd_Bal
T_3
T_4
Sender
All Crd_Bal
finished sending data received
Crd_Bal data
Adaptive Credit-Based Control
(Kung and Chang 95)
• Idea: make buffer size proportional to actual bandwidth,
for each connection.
• For each connection and on each allocation interval,
Buf_Alloc = (M/2 – TQ – N) * (VU/TU)
M: buffer size
TQ: current buffer occupancy
VU: amount of data forwarded for the connection
TU: amount forwarded for all N connections.
• M = 4 * RTT + 2 * N
• Easy to show no losses. But can allocation be controlled
precisely?
• Once bandwidth is introduced, the scheme can no longer
handle burst well.
Comparison with Rate Control
• Requires large buffer size for high throughput.
– Needs adaptive buffer allocation scheme.
• Perfect rate control eliminate buffer requirement.
– Calculate rate allocation before or at the beginning of transmission.
– Use the allocated rate until traffic condition changes
– Works well when traffic pattern does not change much; otherwise
needs at least delay-bandwidth product amount of buffers to
absolve losses.
– Not easy to design rate allocation algorithms.
• Explicit rate calculation
• Linear increase and multiplicative decrease: need delay-bandwidth
product amount of buffers for good throughput
• Performance difference depends on network traffic pattern
BSD - TCP Flow Control
• Receiver advertises free buffer space
win = Buf_Alloc – Que_siz
• Sender can send [snd_una, snd_una + snd_win –1].
snd_win = win; snd_una: oldest unACKed number
snd_win = 6: advertised by receiver
1
2
3
4
5
6
7
8
9
sent and ACKed sent not ACKed can send ASAP
10
11
…
can’t send until
window moves
snd_una
snd_nxt
next send number
Equivalence of TCP and CreditBased Flow Control
• The two credit/window update protocols are
“equivalent”, assuming no packet losses in the
network and no packet mis-ordering.
• TCP is inherently more complicated due to the
need of reliability control. The receiver needs to
tell the sender HOW MANY and WHICH packets
have been forwarded.
• Regarding to “WHICH”, TCP takes the simplistic
approach to ACK the first un-received data.
TCP Example
• Receiver: ACKs 4, win = 3. (Total buffer size = 6)
1
2
3
4
5
6
7
8
9
10
11
…
7
8
9
10
11
…
• Sender: sends 4 again
snd_win = 3
1
2
3
4
5
6
snd_una
• Sender: after 4 is received at receiver.
snd_win = 6
3
4
5
6
7
snd_una
8
9
10
11
12
13
…
WebTP Packet Header Format
0
1
2
3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Packet Number
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Acknowledgment Number
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Acknowledged Vector
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
ADU Name
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Segment Number
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Source Port
|
Destination Port
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data |U|A|R|S|F|R|E|F|P| C | P |
|
| Offset|R|C|S|Y|I|E|N|A|T| C | C |
RES
|
|
|G|K|T|N|N|L|D|S|Y| A | L |
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Window
|
Checksum
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Options
|
Padding
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
data
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
WebTP: Reliability Control
• A flow can be reliable (in TCP sense) or unreliable (in
UDP sense).
• Shared feedback for reliability and for congestion control.
• Reliable flow uses TCP-styled flow control and data reassembly. A loss at the receiver due to flow-control buffer
overflow is not distinguished from a loss at the pipe. But,
this should be rare.
• Unreliable flow: losses at receiver due to overflowing B_i
are not reported back to the sender. No window flow
control is needed for simplicity. (Is the window
information useful?)
WebTP: Buffer Management
• Each flow gets a fixed upper bound on queue size,
say B_i.  B_i >= B_r is possible.
• Later on, B_i will adapt to speed of application.
• Receiver of a flow maintains rcv_nxt and
rcv_adv.
B_i = rcv_adv - rcv_nxt + 1
• Packets outside [rcv_nxt, rcv_adv] are
rejected.
WebTP Example
• Receiver: (positively) ACKs 5, 6, and 7, win = 3. (B_i = 6)
1
2
3
4
5
6
7
8
9
rcv_nxt
10
11
…
11
…
rcv_adv
• Sender: can send 4, 8 and 9, subject to congestion control
snd_win = 3
1
2
3
4
5
6
7
8
9
10
snd_nxt
snd_una
• Sender: after 4, 8 and 9 are received at receiver.
snd_win = 6
5
6
7
8
9
10
snd_una
snd_nxt
11
12
13
14
15
…
WebTP: Deadlock Prevention
(Reliable Flows)
• Deadlock prevention: pre-allocate bN buffer spaces, b >=
1, where N = max. number of flows allowed.
• When dynamic buffer runs out, enter deadlock prevention
mode. In this mode,
– each flow accepts only up-to b in-sequence packets for each flow.
– when a flow uses up b buffers, it won’t be allowed to use any
buffers until b buffers are freed.
• We guard against case where all but one flow is still
responding. In practice, we only need N to be some
reasonable large number.
• b = 1 is sufficient, but can be greater than 1 for
performance reason.
WebTP: Feedback Scheme
• The Window field in packet header is for each flow.
• Like TCP, it is the current free buffer space for the flow.
• When a flow starts, use the FORCE bit (FCE) for
immediate ACK from the flow.
• To inform the sender about the window size, flow
generates an ACK for every 2 received packets (MTU).
• Pipe generates an ACK for every k packets.
• ACK can be piggybacked in the reverse data packets.
Acknowledgement Example: Four Flows
Receiver:
1
2
3
4
1
2
3
4
1
2
3
4
1
2
Pipe:
Flow 1:
Flow 2:
Flow 3:
Flow 4:
Result:
With some randomness in the traffic, 62 ACKs are
generated for every 100 data packets.
3
4
Summary of Issues
• Find control scheme suitable for both pipe level
and flow level.
– Reconcile network control and last-hop control
– Note that feedback for congestion control and reliability
control is entangled.
• Buffer management at receiver
– Buffer sizing
• for re-assembly
• for congestion control
– Deadlock prevention
Correctness of Protocol and Algorithm
• Performance typically deals with average cases, and can be
studied by model-based analysis or simulation.
• What about correctness?
– Very often in networking, failures are more of the concerns than poor
performance.
• Correctness of many distributed algorithms in networking area
has not been proven.
• What can be done?
– Need formal description
– Need methods of proof
• Some references for protocol verification: I/O Automata
([Lynch88]), Verification of TCP ([Smith97])
References
[Mitra92] Debasis Mitra, “Asymptotically Optimal Design of Congestion Control for
High Speed Data Networks”, IEEE Transactions on Communications, VOL. 10 NO. 2,
Feb. 1992
[Fendick92] Kerry W. Fendick, Manoel A. Rodrigues and Alan Weiss, “Analysis of a
rate-based feedback control strategy for long haul data transport”, Performance Evaluation
16 (1992), pp. 67-84
[Kung and Morris 95], H.T. Kung and Robert Morris, “Credit-Based Flow Control for ATM
Networks”, IEEE Network Magazine, March 1995.
[Kung and Chang 95], H.T. Kung and Koling Chang, “Receiver-Oriented Adaptive Buffer
Allocation in Credit-Based Flow Control for ATM Networks”, Proc. Infocom ’95.
[Smith97] Mark Smith. “Formal Verification of TCP and T/TCP”. PhD thesis, Department
of EECS, MIT, 1997.
[Lynch88], Nancy Lynch and Mark Tuttle. “An introduction to Input/Output automata”.
Technical Memo MIT/LCS/TM-373, Laboratory for Computer Science, MIT, 1988.