Time, Clocks, and the Ordering of Events in a Distributed

Download Report

Transcript Time, Clocks, and the Ordering of Events in a Distributed

Your Data Center Is a Router:
The Case for Reconfigurable Optical Circuit Switched Paths
2nd Symposium on Networked Systems Design &
Implementation (NSDI)
Boston, MA
May 2-4, 2005
Guohui Wang3, David G. Andersen2, Michael Kaminsky1, Michael Kozuch1,
T. S. Eugene Ng3, Dina Papagiannaki1, Madeleine Glick1 and Lily Mummert1,
1. Intel Labs Pittsburgh
2. Carnegie Mellon University
3. Rice University
1
Data Center Network
 Today’s Data Center Network
Core
Switch
End of Row
Switch
Top of
Rack
Switch
Picture from: James Hamilton, Architecture for Modular Data Centers
 Data intensive applications are experiencing bandwidth
bottleneck in the tree structure data center networks.
 E.g. Video data processing, MapReduce …
2
Full bisection bandwidth solutions
 Re-structure data center network to
provide full bisection bandwidth
among all the servers.
 Complicated network
structure, hard to construct
and expand.
Tree
FatTree
BCube
Picture from: Ken Hall, Green Data Centers
3
Full bisection bandwidth may not be necessary
 Spatial Traffic Locality
– Nodes only communicate
with a small number of
partners.
– e.g. Earthquake simulation
 Temporal Traffic Locality
– Applications might hit
CPU, disk IO or Sync
bounds.
– e.g. MapReduce
 Many measurement studies have suggested evidence of
traffic locality.
– [SC05][WREN09][IMC09][HotNets09]
Full bisection bandwidth solutions provide too much with high costs.
4
An alternative design: hybrid data center network
Electrical packet-switched network
A
B
C
D
E
F
Optical circuit-switched network
 Hybrid network may give us best of both worlds:
– Optical circuit-switched paths for data intensive transfer.
– Electrical packet-switched paths for timely delivery.
5
Optical Circuit Switching
 MEMS Optical Switching Module
Picture from: http://www.ntt.co.jp/milab/en/project/pr05_3Dmems.html
Switching at whatever rate
modulated on input/output ports
Up to tens of ms physical
reconfiguration time
6
Optical Channels
 Ultra-high bandwidth
40G, 100Gbps technology
has been developed.
15.5Tbps over a single fiber!
 Dropping prices
Price of Optical Tranceivers
1.2
Costs
1
OC-192,
10Gbps, Ovum
RHK
0.8
0.6
OC-192,
10Gbps,
Lightcounting
0.4
0.2
2012
2010
2008
2006
2004
2002
Price data from: Joe Berthold,
Hot Interconnects’09
2000
0
OC-768,
40Gbps, VSR
Year
7
Optical circuits in datacenters
A
B
C
D
E
F
A - E, B - D, C - F
A - D, B - E, C - F
A - F, B - E, C - D
 Advantage:
– Simple and flexible:
easy to construct,
expand and manage
– Ultra-high bandwidth
– Low power
 Disadvantage:
– Fat pipes are not all-to-all.
– Reconfiguration overhead
8
Research questions
• Enough traffic locality in data centers to leverage optical path?
• Reconfigure optical paths fast enough to meet dynamic traffic?
• How to integrate optical circuits into data centers at low costs?
• How to manage and leverage optical paths?
• How do applications behave over the hybrid network?
9
Is there enough traffic locality?
 Analyzing production data
center traffic trace:
– 7 racks, 155 servers,
1060 cores
– One week NetFlow
traces collected at all
servers
10 sec TM
10s
10s
10s
…
Time
0.7
0.6
– Configure 3 optical
paths out of total 21
cross-rack paths with
maximum optical
traffic, reconfigure
every 10s.
0.5
0.4
Fraction of
Optical Traffic
(on average)
0.3
0.2
0.1
0
Evenly
Distributed
Traffic
Real Traffic
is skewed
Traffic locality: a few optical paths have the potential to
offload significant amount of traffic from electrical networks.
10
Can optical paths be reconfigured fast enough?
- Optical Path Configuration Algorithm
R1 R2 R3 R4 R5 R6 R7 R8
R1
R2
R3
Graph G: (V, E)
wxy= vol(Rx, Ry) + vol(Ry, Rx)
R2
w12
R1
w
w14 R4 27
w43
R3
R5
R4
R5
w35
w47
w36
R6
w68
w38
R8
R6
R7
R7
R8
Solved by polynomial time
Edmonds’ algorithm[1]!
Optical path configuration is a
maximum weight perfect
matching on graph G.
[1] J. Edmonds, Paths, trees and flowers, Canadian J. of Mathematics, pp 449-467, 1965
11
Can optical paths be reconfigured fast enough?
- Optical Path Configuration Time
 Several time factors
– Computation time
• 640ms for a 1000-rack data
center using Edmonds’
algorithm.
– Signaling time
• < 1ms in data centers
– Physical reconfiguration time
• Up to tens ms for MEMS
optical switches
Even in very large data centers, optical paths can
still be reconfigured at small time scales (< 1 sec).
12
How to manage optical paths in data centers?
 Routing over dynamic dual-path (electrical/optical) network:
• Ethernet Spanning Tree?
– NO, dual paths will be blocked
• Link State Routing?
– NO, long routing convergence
time after reconfiguration
13
How to manage optical paths in data centers?
 VLAN based dual-path routing:
VLAN1:
Electrical
VLAN2:
Optical
• Advantages:
– Leverage both electrical and optical paths by tagging packets
– No route convergence delay after optical reconfiguration
– No need to modify switches
14
How to manage optical paths in data centers?
 How to measure application traffic demand?
Servers
 Extensive buffering
at servers
– Traffic demands
measurement
– Aggregate traffic
and batch for optical
transfer
 Per-rack virtual
output queuing:
– Avoid head-of-line
blocking
User
Apps
Kernel
Per-rack
Virtual
Output
Queue
Scheduler
Network Interface
15
How to manage optical paths in data centers?
 How to configure optical paths and schedule traffic to them?
Servers
 A centralized
manager to control
the optical path
configuration.
 Configurable virtual
output queue
scheduler to control
traffic to optical
paths.
User
Stats
Apps
Daemon
Kernel
Stats
Config Configuration
Manager
Config
Per-rack
Virtual
Output
Queue
A
B
C
D
Scheduler
Network Interface
Config
Traffic
Switches
with VLAN
settings
16
Challenges
• TCP/IP reacting to optical path reconfiguration.
• Potential long delays caused by extensive queuing at servers.
• Collecting traffic demand from a million servers.
• Choosing the right buffer sizes and reconfiguration intervals.
17
Summary
 Adding optical circuit switched paths into data centers.
 Potential benefits:
• A simpler and flexible data center network design.
• Relieving data intensive applications from network bottlenecks.
18