pptx - Stanford University Networking Seminar

Download Report

Transcript pptx - Stanford University Networking Seminar

Multipath TCP
Costin Raiciu
Christoph Paasch
University Politehnica of Bucharest
Université catholique de Louvain
Joint work with: Mark Handley, Damon Wischik, University College London
Olivier Bonaventure, Sébastien Barré, Université catholique de Louvain
and many many others
Thanks to
Networks are becoming multipath
Mobile devices have multiple wireless connections
Networks are becoming multipath
Networks are becoming multipath
Networks are becoming multipath
Datacenters have redundant topologies
Networks are becoming multipath
Client
Servers are multi-homed
How do we use these networks?
TCP.
Used by most applications,
offers byte-oriented reliable delivery,
adjusts load to network conditions
[Labovits et al – Internet Interdomain traffic – Sigcomm 2010]
TCP is single path
A TCP connection
Uses a single-path in the network regardless of
network topology
Is tied to the source and destination addresses
of the endpoints
Mismatch between
network and transport
creates problems
Poor Performance for Mobile Users
3G celltower
Poor Performance for Mobile Users
3G celltower
Poor Performance for Mobile Users
3G celltower
Poor Performance for Mobile Users
3G celltower
Offload to WiFi
Poor Performance for Mobile Users
3G celltower
All ongoing TCP connections die
Collisions in datacenters
[Fares et al - A Scalable, Commodity Data Center Network Architecture - Sigcomm 2008]
Single-path TCP collisions reduce
throughput
[Raiciu et. Al – Sigcomm 2011]
Multipath TCP
Multipath TCP (MPTCP) is an
evolution of TCP that can effectively
use multiple paths within a single
transport connection
• Supports unmodified applications
• Works over today’s networks
• Standardized at the IETF (almost there)
Multipath TCP components
Connection setup
Sending data over multiple paths
Encoding control information
Dealing with (many) middleboxes
Congestion control
[Raiciu et. al – NSDI 2012]
[Wischik et. al – NSDI 2011]
Multipath TCP components
Connection setup
Sending data over multiple paths
Encoding control information
Dealing with (many) middleboxes
Congestion control
[Raiciu et. al – NSDI 2012]
[Wischik et. al – NSDI 2011]
MPTCP Connection Management
MPTCP Connection Management
MPTCP Connection Management
SUBFLOW 1
CWND
Snd.SEQNO
Rcv.SEQNO
FLOW
Y
MPTCP Connection Management
SUBFLOW 1
CWND
Snd.SEQNO
Rcv.SEQNO
FLOW
Y
MPTCP Connection Management
SUBFLOW 1
CWND
Snd.SEQNO
Rcv.SEQNO
FLOW
Y
MPTCP Connection Management
SUBFLOW 1
CWND
Snd.SEQNO
Rcv.SEQNO
FLOW
Y
SUBFLOW 2
CWND
Snd.SEQNO
Rcv.SEQNO
TCP Packet Header
Bit 0
Bit 15 Bit 16
Source Port
Bit 31
Destination Port
Sequence Number
Acknowledgment Number
Header
Length
Reserved
Code bits
Checksum
20
Bytes
Receive Window
Urgent Pointer
Options
Data
0 - 40
Bytes
TCP Packet Header
Bit 0
Bit 15 Bit 16
Source Port
Bit 31
Destination Port
Sequence Number
Acknowledgment Number
Header
Length
Reserved
Code bits
Checksum
20
Bytes
Receive Window
Urgent Pointer
Options
Data
0 - 40
Bytes
Sequence Numbers
Packets go multiple paths.
– Need sequence numbers to put them back in sequence.
– Need sequence numbers to infer loss on a single path.
Options:
– One sequence space shared across all paths?
– One sequence space per path, plus an extra one to put
data back in the correct order at the receiver?
Sequence Numbers
• One sequence space per path is preferable.
– Loss inference is more reliable.
– Some firewalls/proxies expect to see all the
sequence numbers on a path.
• Outer TCP header holds subflow sequence
numbers.
– Where do we put the data sequence numbers?
MPTCP Packet Header
Bit 0
Bit 15 Bit 16
Subflow Source Port
Bit 31
Subflow Destination Port
Subflow Sequence Number
Subflow Acknowledgment Number
Header
Length
Reserved
Code bits
Checksum
Data sequence number
20
Bytes
Receive Window
Urgent Pointer
Options
Data
Data ACK
0 - 40
Bytes
MPTCP Operation
options
…
SEQ
1000
…
DSEQ
10000
DATA
SUBFLOW 1
CWND
Snd.SEQNO
Rcv.SEQNO
FLOW
Y
SUBFLOW 2
CWND
Snd.SEQNO
Rcv.SEQNO
MPTCP Operation
options
…
SEQ
1000
…
DSEQ
10000
DATA
SUBFLOW 1
CWND
Snd.SEQNO
Rcv.SEQNO
FLOW
Y
SUBFLOW 2
CWND
Snd.SEQNO
Rcv.SEQNO
MPTCP Operation
options
…
SEQ
1000
…
DSEQ
10000
DATA
SUBFLOW 1
CWND
Snd.SEQNO
Rcv.SEQNO
FLOW
Y
options
…
SEQ
5000
…
DSEQ
11000
DATA
SUBFLOW 2
CWND
Snd.SEQNO
Rcv.SEQNO
MPTCP Operation
options
…
SEQ
1000
…
DSEQ
10000
DATA
SUBFLOW 1
CWND
Snd.SEQNO
Rcv.SEQNO
FLOW
Y
options
…
SEQ
5000
…
DSEQ
11000
DATA
SUBFLOW 2
CWND
Snd.SEQNO
Rcv.SEQNO
MPTCP Operation
options
…
SEQ
1000
…
DSEQ
10000
DATA
SUBFLOW 1
CWND
Snd.SEQNO
Rcv.SEQNO
FLOW
Y
options
…
SEQ
5000
…
DSEQ
11000
DATA
SUBFLOW 2
CWND
Snd.SEQNO
Rcv.SEQNO
MPTCP Operation
…
ACK
2000
Data ACK
11000
…
SUBFLOW 1
CWND
Snd.SEQNO
Rcv.SEQNO
FLOW
Y
options
…
SEQ
5000
…
DSEQ
11000
DATA
SUBFLOW 2
CWND
Snd.SEQNO
Rcv.SEQNO
MPTCP Operation
SUBFLOW 1
CWND
Snd.SEQNO
Rcv.SEQNO
FLOW
Y
options
…
SEQ
5000
…
DSEQ
11000
DATA
SUBFLOW 2
CWND
Snd.SEQNO
Rcv.SEQNO
MPTCP Operation
SUBFLOW 1
CWND
Snd.SEQNO
Rcv.SEQNO
FLOW
Y
options
…
SEQ
5000
…
DSEQ
11000
DATA
SUBFLOW 2
CWND
Snd.SEQNO
Rcv.SEQNO
MPTCP Operation
options
…
SEQ
2000
…
DSEQ
11000
DATA
SUBFLOW 1
CWND
Snd.SEQNO
Rcv.SEQNO
FLOW
Y
SUBFLOW 2
CWND
Snd.SEQNO
Rcv.SEQNO
Multipath TCP
Congestion Control
Packet switching ‘pools’ circuits.
Multipath ‘pools’ links
TCP controls how a link is shared.
How should a pool be shared?
Two circuits
A link
Two separate
links
42
A pool of links
Design goal 1:
Multipath TCP should be fair to regular TCP at
shared bottlenecks
A multipath
TCP flow with
two subflows
Regular TCP
To be fair, Multipath TCP should take as much capacity as
bottleneckTCP
link,should
no matter
manycapacity
subflows
ToTCP
be at
fair,a Multipath
take how
as much
asitTCP
atisausing.
bottleneck link, no matter how many paths it is using.
43
Design goal 2:
44
MPTCP should use efficient paths
12Mb/s
12Mb/s
12Mb/s
Each flow has a choice of a 1-hop and a 2-hop path.
To be fair, Multipath TCP should take as much capacity as TCP
How
should splitlink,
its traffic?
at a bottleneck
no matter how many paths it is using.
Design goal 2:
45
MPTCP should use efficient paths
12Mb/s
8Mb/s
12Mb/s
8Mb/s
12Mb/s
8Mb/s
If
flow
split its traffic
1:1 ... take as much capacity as TCP
Toeach
be fair,
Multipath
TCP should
at a bottleneck link, no matter how many paths it is using.
Design goal 2:
46
MPTCP should use efficient paths
12Mb/s
9Mb/s
12Mb/s
9Mb/s
12Mb/s
9Mb/s
If
flow
split its traffic
2:1 ... take as much capacity as TCP
Toeach
be fair,
Multipath
TCP should
at a bottleneck link, no matter how many paths it is using.
Design goal 2:
47
MPTCP should use efficient paths
12Mb/s
10Mb/s
12Mb/s
10Mb/s
12Mb/s
10Mb/s
•To If
split its
traffic
4:1take
... as much capacity as TCP
beeach
fair, flow
Multipath
TCP
should
at a bottleneck link, no matter how many paths it is using.
Design goal 2:
48
MPTCP should use efficient paths
12Mb/s
12Mb/s
12Mb/s
12Mb/s
12Mb/s
12Mb/s
•To If
split its
traffic
∞:1take
... as much capacity as TCP
beeach
fair, flow
Multipath
TCP
should
at a bottleneck link, no matter how many paths it is using.
Design goal 3:
49
MPTCP should get at least as much as TCP on
the best path
wifi path:
high loss, small RTT
3G path:
low loss, high RTT
Design Goal 2 says to send all your traffic on the least
congested
path, in this
3G.take
But this
has high
RTT, as TCP
To
be fair, Multipath
TCPcase
should
as much
capacity
hence
it will give
lownothroughput.
at
a bottleneck
link,
matter how many paths it is using.
How does TCP congestion control work?
Maintain a congestion window w.
Increase w for each ACK, by 1/w
Decrease w for each drop, by w/2
50
How does MPTCP congestion control work?
Maintain a congestion window wr, one
window for each path, where r ∊ R
ranges over the set of available paths.
Increase wr for each ACK on path r, by

w
r
r
Decrease wr for each drop on path r, by
wr /2
51
How does MPTCP congestion control work?
Maintain a congestion window wr, one
window for each path, where r ∊ R
ranges over the set of available paths.
Increase wr for each ACK on path r, by

Goal 2
w
r
r
Decrease wr for each drop on path r, by
wr /2
52
How does MPTCP congestion control work?
Maintain a congestion window wr, one
window for each path, where r ∊ R
ranges over the set of available paths.
Increase wr for each ACK on path r, by
Goals 1&3

w
r
r
Decrease wr for each drop on path r, by
wr /2
53
How does MPTCP congestion control work?
Maintain a congestion window wr, one
window for each path, where r ∊ R
ranges over the set of available paths.
Increase wr for each ACK on path r, by
Decrease wr for each drop on path r, by
wr /2
54
Applications
of Multipath TCP
At a multihomed web server, MPTCP tries
to share the ‘pooled access capacity’ fairly.
2 TCPs
@ 50Mb/s
100Mb/s
100Mb/s
4 TCPs
@ 25Mb/s
56
At a multihomed web server, MPTCP tries
to share the ‘pooled access capacity’ fairly.
2 TCPs
@ 33Mb/s
1 MPTCP
@ 33Mb/s
4 TCPs
@ 25Mb/s
100Mb/s
100Mb/s
57
At a multihomed web server, MPTCP tries
to share the ‘pooled access capacity’ fairly.
2 TCPs
@ 25Mb/s
2 MPTCPs
@ 25Mb/s
100Mb/s
100Mb/s
4 TCPs
@ 25Mb/s
The total capacity, 200Mb/s, is shared
out evenly between all 8 flows.
58
At a multihomed web server, MPTCP tries
to share the ‘pooled access capacity’ fairly.
2 TCPs
@ 22Mb/s
3 MPTCPs
@ 22Mb/s
100Mb/s
100Mb/s
4 TCPs
@ 22Mb/s
The total capacity, 200Mb/s, is shared out
evenly between all 9 flows.
It’s as if they were all sharing a single
200Mb/s link. The two links can be said to
form a 200Mb/s pool.
59
At a multihomed web server, MPTCP tries
to share the ‘pooled access capacity’ fairly.
2 TCPs
@ 20Mb/s
4 MPTCPs
@ 20Mb/s
100Mb/s
100Mb/s
4 TCPs
@ 20Mb/s
The total capacity, 200Mb/s, is shared out
evenly between all 10 flows.
It’s as if they were all sharing a single
200Mb/s link. The two links can be said to
form a 200Mb/s pool.
60
At a multihomed web server, MPTCP tries
to share the ‘pooled access capacity’ fairly.
5 TCPs
100Mb/s
First 0,
then
10 MPTCPs
100Mb/s
throughput per flow [Mb/s]
15 TCPs
We confirmed in experiments that
MPTCP nearly manages to pool the
capacity of the two access links.
Setup: two 100Mb/s access links,
10ms delay, first 20 flows, then 30.
time [min]
61
At a multihomed web server, MPTCP tries
to share the ‘pooled access capacity’ fairly.
5 TCPs
100Mb/s
First 0,
then
10 MPTCPs
100Mb/s
62
15 TCPs
MPTCP makes a collection of links behave like a single
large pool of capacity —
i.e. if the total capacity is C, and there are n flows,
each flow gets throughput C/n.
Multipath TCP can pool datacenter networks
Instead of using one path for each flow, use
many random paths
Don’t worry about collisions.
Just don’t send (much) traffic on colliding
paths
Multipath TCP in data centers
Multipath TCP in data centers
MPTCP better utilizes the FatTree network
MPTCP on EC2
• Amazon EC2: infrastructure as a service
– We can borrow virtual machines by the hour
– These run in Amazon data centers worldwide
– We can boot our own kernel
• A few availability zones have multipath topologies
– 2-8 paths available between hosts not on the same
machine or in the same rack
– Available via ECMP
Amazon EC2 Experiment
• 40 medium CPU instances running MPTCP
• For 12 hours, we sequentially ran all-to-all
iperf cycling through:
– TCP
– MPTCP (2 and 4 subflows)
MPTCP improves performance on EC2
Same
Rack
Implementing
Multipath TCP
in the Linux Kernel
Linux Kernel MPTCP




About 10000 lines of code in the Linux Kernel
Initially started by Sébastien Barré
Now, 3 actively working on Linux Kernel MPTCP

Christoph Paasch

Fabien Duchêne

Gregory Detal
Freely available at http://mptcp.info.ucl.ac.be
MPTCP-session creation
Application creates regular TCPsockets
MPTCP-session creation
The Kernel creates the Meta-socket
MPTCP creating new subflows
The Kernel handles the different
MPTCP subflows
MPTCP Performance with apache
100 simultaneous HTTP-Requests, total of 100000
MPTCP Performance with apache
100 simultaneous HTTP-Requests, total of 100000
MPTCP Performance with apache
100 simultaneous HTTP-Requests, total of 100000
MPTCP on multicore architectures


Flow-to-core affinity steers all packets from one TCP-flow to
the same core.
MPTCP has lots of L1/L2 cache-misses because the individual
subflows are steered to different CPU-cores
MPTCP on multicore architectures
MPTCP on multicore architectures


Solution: Send all packets from the same MPTCP-session to the
same CPU-core
Based on Receive-Flow-Steering implementation in Linux
(Author: Tom Herbert from Google)
MPTCP on multicore architectures
Multipath TCP
on
Mobile Devices
MPTCP over WiFi/3G
TCP over WiFi/3G
MPTCP over WiFi/3G
MPTCP over WiFi/3G
MPTCP over WiFi/3G
MPTCP over WiFi/3G
MPTCP over WiFi/3G
MPTCP over WiFi/3G
MPTCP over WiFi/3G
WiFi to 3G handover with
Multipath TCP

A mobile node may lose its WiFi connection.

Regular TCP will break!


Some applications support recovering from a
broken TCP (HTTP-Header Range)
Thanks to the REMOVE_ADDR-option, MPTCP is
able to handle this without the need for
application support.
WiFi to 3G handover with
Multipath TCP
WiFi to 3G handover with
Multipath TCP
WiFi to 3G handover with
Multipath TCP
WiFi to 3G handover with
Multipath TCP
WiFi to 3G handover with
Multipath TCP
WiFi to 3G handover with
Multipath TCP
WiFi to 3G handover with
Multipath TCP
Related Work
Multipath TCP has been proposed many times
before
– First by Huitema (1995),CMT, pTCP, M-TCP, …
You can solve mobility differently
– At different layer: Mobile IP, HTTP range
– At transport layer: Migrate TCP, SCTP
You can deal with datacenter collisions differently
– Hedera (Openflow + centralized scheduling)
Multipath topologies
need multipath transport
Multipath TCP can be used
by unchanged applications
over today’s networks
MPTCP moves traffic away from congestion,
making a collection of links behave like a
single pooled resource
Backup Slides
Packet-level ECMP in datacenters
How does MPTCP congestion control work?
107
Maintain a congestion window wr, one
window for each path, where r ∊ R
ranges over the set of available paths.
Increase wr for each ACK on path r, by
Design goals 1&3:
At any potential bottleneck
S that path r might be in,
look at the best that a
single-path TCP could get,
and compare to what I’m getting.
Decrease wr for each drop on path r,
by wr /2
How does MPTCP congestion control work?
108
Maintain a congestion window wr, one
window for each path, where r ∊ R
ranges over the set of available paths.
Design goal 2:
We want to shift traffic
away from congestion.
To achieve this, we
increase windows in
proportion to their
size.
Increase wr for each ACK on path r, by
Decrease wr for each drop on path r,
by wr /2