Transcript Slide 1

Coflow
A Networking
Abstraction For
Cluster Applications
Mosharaf Chowdhury
Ion Stoica
UC Berkeley
Cluster Applications
Multi-Stage Data Flows
» Computation interleaved
with communication
Computation
Driver
» Distributed
» Runs on many machines
Communication
» Structured
» Between machine groups
2
Communication Abstraction
A Flow
» Sequence of packets
» Independent
» Often the unit for network
scheduling, traffic engineering,
load balancing etc.
Multiple Parallel Flows
» Independent
» Yet, semantically bound
» Shared objective
Driver
Minimize
Completio
n Time
3
Coflow
A collection of flows
between two groups of
machines that are
bound together by
application-specific
semantics
Captures
1. Structure
2. Shared Objective
3. Semantics
4
We Want To…
Better schedule the network
» Intra-coflow
» Inter-coflow
Write the communication layer of a new application
» Without reinventing the wheel
Add unsupported coflows to an application, or
Replace an existing coflow implementation
» Independent of applications
5
Cluster
Applications
Coflow
AP
I
The Network
(Physically or Logically Centralized Controller)
6
Goals
Coflow
1. Separate intent from mechanisms
AP
I
2. Convey application-specific
semantics to the network
7
Coflow
AP
I
terminate(handle)
Job
finishes
get(handle, id)  content
Shuffl
e
finishe
s
Driver
put(handle, id, content)
create(SHUFFLE)  handle
MapReduc
e
8
Flexibilit
y
reducer
s
shuffl
e
Coflow
Choice of algorithms
mapper
s
1. Orchestra, SIGCOMM’2011
» Default
» WSS1
Choice of mechanism
» App vs. Network layer
» Pull vs. Push
9
@driver
b  create(BCAST)
…
Coflow
reducer
s
broadcas
t
shuffl
e
Flexibilit
y
mappe
rs
driver
(JobTrack
er)
put(b, id, content)
…
terminate(b)
@mapper
get(b, id)
…
1
0
@driver
b  create(BCAST)
s  create(SHUFFLE,
ord=[b ~> s])
Coflow
reducer
s
broadca
st
shuffl
e
Flexibilit
y
mapper
s
driver
(JobTrack
er)
put(b, id, content)
…
terminate(b)
terminate(s)
@mapper
get(b, id)
put(s, ids1)
…
11
Throughput-Sensitive
Applications
After 2 seconds
Minimize Completion Time
12
Throughput-Sensitive
Applications
After 4 seconds
After 7 seconds
After 2 seconds
Minimize Completion Time
13
Throughput-Sensitive
Applications
Free up resources
without hurting
applicationperceived
communication
time
After 7 seconds
After 2 seconds
Minimize Completion Time
14
Latency-Sensitive Applications
HotNets 2012
Top-level
Aggregato
r
Mid-level
Aggregato
rs
Workers
15
Latency-Sensitive Applications
HotNets 2012
HotNets-XI: Home Page
conferences.sigcomm.org/hotnets/2012/
The Eleventh ACM Workshop on Hot Topics in Networks (HotNets-XI) will bring together
people with interest in computer networks to engage in a lively debate ...
Top-level
Aggregato
r Workshop | acm sigcomm
HotNets
www.sigcomm.org/events/hotnets-workshop
The Workshop on Hot Topics in Networks (HotNets) was created in 2002 to discuss
early-stage, creative ... HotNets-XI, Seattle, WA area, October 29-30, 2012.
Meet
Deadline
1,2
Mid-level
Aggregato
HotNets-XI:
Call for Papers
conferences.sigcomm.org/hotnets/2012/cfp.shtml
rs
The Eleventh ACM Workshop on Hot Topics in Networks (HotNets-XI) will bring together
researchers in computer networks and systems to engage in a lively ...
Meet
Deadline
1,2
Coflow accepted at HotNets'2012
www.mosharaf.com/blog/2012/09/.../coflow-accepted-at-hotnets201...
Sep 13, 2012 – Update: Coflow camera-ready is available online! Tell us what you think!
Our position paper to address the lack of a networking abstraction for ...
Workers
1. D3, SIGCOMM’2011
2. PDQ, SIGCOMM’2012
Limit impact to as
few requests as
16
One More Thing…
1. Critical Path Scheduling
2. OpenTCP
3. Structured Streams
4. …
17
Coflow
A semantically-bound collection of flows
Conveys application intent to the network
» Allows better management of network resources
» Provides greater flexibility in designing applications
Mosharaf Chowdhury
http://www.mosharaf.com/
UC Berkeley
Critical Path Scheduling
Communication of a cluster application is
represented by a partially-ordered set of coflows
S
A
A
S
S
B
S
S
Network allocation takes place among these
partially-ordered sets of coflows
19
Coflow
Operation
Caller
create(PATTERN, [opt])  handle
Driver
AP
I
put(handle, id, content, [opt])  result
Sender
get(handle, id, [opt])  content
Receiver
terminate(handle, [opt])  result
Driver
2
0
Throughput-Sensitive
Applications
Job
finishes
Shuffle
finishes
Local shuffle
finishes
Local shuffle
finishes
Reduce
Stage
Minimize
Completion
Time1
Map
Stage
MapReduc
e
Framewor
1. Orchestra, SIGCOMM’2011
k
Data Flow
21
reducers
shuffle1
shuffle2
reducers
Coflow
Resourc
e
Allocation
1. Weights
[Across
Apps]
mappe
rs
mappe
rs
Job 1
Job 2
Weighted sharing between coflows
@driver
shuffle1  create(SHUFFLE, weight=1)
shuffle2  create(SHUFFLE, weight=2)
…
2
2
reducers
shuffle1
shuffle2
reducers
Coflow
Resourc
e
Allocation
2. Priorities
[Across
Apps]
mappe
rs
mappe
rs
Job 1
Job 2
Strict priorities
@driver
shuffle1  create(SHUFFLE, pri=3)
shuffle2  create(SHUFFLE, pri=5)
…
2
3
reducers
shuffle2
mappe
rs
broadcast
(b)
mappe
rs
Coflow
Resourc
e
aggregation(a
gg)
shuffle1
reducers
driver
Job 1
Job 2
Allocation
finishes_before (~>)
3.
Dependencies
@driver
b  create(BCAST)
shuffle2]) create(SHUFFLE, ord=[b ~>
agg  create(AGGR, ord=[shuffle2 ~> agg])
[Within
Apps]
2
4
Communication of a cluster
application
is represented by
a partially-ordered set of coflows
Coflow
Resourc
e
Allocatio
n
S
A
A
S
S
B
S
S
Network allocation takes place
among these partially-ordered sets of
2
coflows
5