Transcript Slide 1
How to Train your Dragonfly
EE382C FINAL PRESENTATION
MAY 24,2011
HYUNGMIN CHO
ANDREW DANOWITZ
MARIO FLAJSLIK
AMIMUL IHSAN
Outline
Topology
Routing
Flow Control
Hot Spot Management
Status
Topology
Dragonfly
Minimizes expensive global
communication
No more than 4 hops
Modification: Each node
connected to two routers
Source: Lecture 7 Notes
Topology
Assumptions
Per Node Traffic: 5GB/s/node @10%
All intergroup connections in optical cables
Routers can have up to 107 ports
Resulting design
a=26, p=27, h=4
a
p
h_r
Nodes
per
Group
h_g
Global
Required Min
Total Bandwidth Average Router
Groups Cost ($)
(GB)
(GB)
Ports
Max
Router
Ports
Router Ports
Endpoint Router for global
Ports
Ports connection
64 32
1
48
2048
49 127,478
240
1004
127
128
64
64
48
26 40
4
97
1024
98 478,485
485
507
105
106
80
26
97
26 30
4
97
1025
98 478,485
485
508
85
86
60
26
97
26 27
4
97
1026
98 478,485
485
508
79
80
54
26
97
26 13
4
97
1027
98 478,485
485
509
51
52
26
26
97
Routing
Global Network
Group 0
Routing
decision
Group 2
…
Potential
congestion
…
h3
Router 0
Router 1
…
Router 2
h2
…
Router 1
…
Router 0
…
h0
h1
Group 1
Router 2
…
…
Local
Network
Local
Network
Figure modified from: Jiang, Dally, Kim: Indirect Adaptive Routing on Large Scale Interconnection Networks
…
Routing
UGAL-L globally adaptive routing that chooses
between:
MIN – minimal path
VAL – non minimal path routing to a random group first
(Valiant load balancing)
Choice made based on local queue information:
qminHmin compared to: qvalHval
Problems with limited throughput and higher
intermediate latency
Routing
Problem: limited throughput due to imperfect load-
balancing of UGAL-L
UGAL-L will never route non-minimally through the same
router that is used for minimal routing
Solution: UGAL-L using selective Virtual Channel
discrimination
Figure modified from: John Kim, Wiliam J. Dally, Steve Scott, and Dennis
Abts. 2008. Technology-Driven, Highly-Scalable Dragonfly Topology.
Routing
Problem: High intermediate latency due to having to
fill up buffers before sensing congestion
Buffers still need to be sized correctly to achieve maximum
throughput
Solution: Using credit round-trip latency to sense
and signal congestion
Figure from: John Kim, Wiliam J. Dally, Steve Scott, and Dennis Abts.
2008. Technology-Driven, Highly-Scalable Dragonfly Topology.
Flow Control
Basic virtual-channel flow control with credit-based
backpressure
Virtual Channel Flow Control
6 VCs
3 for standard traffic
3 for hotspot traffic
Exploring Packet Sizes
Running simulations with different packet sizes
Hotspot Management
Tree saturation problem
Worse with more path
diversity
Non-interfering networks
Separate VCs for hotspot and
non-hotspot traffic
Figure taken from: EE382C: Lecture15 slides
In the project hotspot traffic is easily distinguished and
hotspot nodes are assigned statically:
Class separation
Hotspot Management
Dynamic hotspot detection
Still use class separation to manage hotspots
Statically assigned (or slow changing) hotspot nodes
Detect hotspots at last hop routers (by counting packets) and propagate
information through the network
Inspect queues for multiple packets going to the same destination, which
is then likely to be a hotspot
Fast changing hotspot nodes
Assumption is that traffic to hotspot nodes is going to spike after node
becomes hotspot
Detect spikes by counting packets and looking for per destination peaks
Use more virtual channels
Impractical case of one VC per destination would solve the problem
Use higher level QoS to do class separation
Status
Bugs squashed to date: 2
Topology
Routing
Flow Control
Status: Topology
In progress
Changing:
Router per group no longer 2a
# Groups no longer a*p+1
Each node connected to two routers
Downsized network of 1,024 nodes
Status: Traffic Pattern
4 kinds of traffic patterns to implement
3 patterns complete
bit-reversal traffic pending
Requires the number of nodes to be power of 2
Iteration of 30 requests-replies
TrafficManager class has been modified extensively
Status: Routing
UGAL-L algorithm
Default function in Dragonfly.cpp
Minimum routing okay on uniform traffic
Working on UGAL-LCR
Credit mechanism needs to be changed
Status: Flow Control
VC size: 256 flits
Non-interfering networks
Separate VC set for the hotspot traffic class
3 VCs are dedicated for hotspot traffic
Exclusively for hotspot traffic
Divide the messages into packets
Started requests and replies at {10,10,10}
Iterating size to: {20,20,20}, etc.
Questions