Networks: Routing and Design CS 258, Spring 99 David E. Culler Computer Science Division U.C.

Download Report

Transcript Networks: Routing and Design CS 258, Spring 99 David E. Culler Computer Science Division U.C.

Networks: Routing and Design
CS 258, Spring 99
David E. Culler
Computer Science Division
U.C. Berkeley
Outline
•
•
•
•
Routing
Switch Design
Flow Control
Case Studies
11/6/2015
CS258 S99
2
Routing
• Recall: routing algorithm determines
– which of the possible paths are used as routes
– how the route is determined
– R: N x N -> C, which at each switch maps the destination node
nd to the next channel on the route
• Issues:
– Routing mechanism
» arithmetic
» source-based port select
» table driven
» general computation
– Properties of the routes
– Deadlock feee
11/6/2015
CS258 S99
3
Routing Mechanism
• need to select output port for each input packet
– in a few cycles
• Simple arithmetic in regular topologies
– ex: Dx, Dy routing in a grid
» west (-x)
Dx < 0
» east (+x)
Dx > 0
» south (-y)
Dx = 0, Dy < 0
» north (+y)
Dx = 0, Dy > 0
» processor
Dx = 0, Dy = 0
• Reduce relative address of each dimension in
order
– Dimension-order routing in k-ary d-cubes
– e-cube routing in n-cube
11/6/2015
CS258 S99
4
Routing Mechanism (cont)
P3
P2
P1
P0
• Source-based
–
–
–
–
message header carries series of port selects
used and stripped en route
CRC? Packet Format?
CS-2, Myrinet, MIT Artic
• Table-driven
– message header carried index for next port at next switch
» o = R[i]
– table also gives index for following hop
» o, I’ = R[i ]
– ATM, HPPI
11/6/2015
CS258 S99
5
Properties of Routing Algorithms
• Deterministic
– route determined by (source, dest), not intermediate state (i.e.
traffic)
• Adaptive
– route influenced by traffic along the way
• Minimal
– only selects shortest paths
• Deadlock free
– no traffic pattern can lead to a situation where no packets
mover forward
11/6/2015
CS258 S99
6
Deadlock Freedom
• How can it arise?
– necessary conditions:
» shared resource
» incrementally allocated
» non-preemptible
– think of a channel as a shared
resource that is acquired incrementally
» source buffer then dest. buffer
» channels along a route
• How do you avoid it?
– constrain how channel resources are allocated
– ex: dimension order
• How do you prove that a routing algorithm is
deadlock free
11/6/2015
CS258 S99
7
Proof Technique
• resources are logically associated with channels
• messages introduce dependences between
resources as they move forward
• need to articulate the possible dependences that
can arise between channels
• show that there are no cycles in Channel
Dependence Graph
– find a numbering of channel resources such that every legal
route follows a monotonic sequence
• => no traffic pattern can lead to deadlock
• network need not be acyclic, on channel
dependence graph
11/6/2015
CS258 S99
8
Example: k-ary 2D array
• Thm: x,y routing is deadlock free
• Numbering
–
–
–
–
+x channel (i,y) -> (i+1,y) gets i
similarly for -x with 0 as most positive edge
+y channel (x,j) -> (x,j+1) gets N+j
similary for -y channels
• any routing sequence: x direction, turn, y
direction is increasing
1
2
00
01
2
17
18
10
17
1
03
0
11
12
13
21
22
23
31
32
33
19
16
30
CS258 S99
02
18
20
11/6/2015
3
9
Channel Dependence Graph
1
00
2
01
2
17
18
10
17
1
2
3
02
1
18 17
0
11
12
13
30
21
22
23
31
32
17 18
0
1
2
33
18 17
2
3
1
0
17 18
17 18
2
3
1
0
16 19
1
2
CS258 S99
18 17
17 18
16 19
11/6/2015
1
18 17
1
2
19
16
3
03
18
20
2
16 19
2
1
16 19
3
0
10
More examples:
• Why is the obvious routing on X deadlock free?
– butterfly?
– tree?
– fat tree?
• Any assumptions about routing mechanism?
amount of buffering?
• What about wormhole routing on a ring?
2
1
0
3
7
4
5
11/6/2015
CS258 S99
6
11
Deadlock free wormhole networks?
• Basic dimension order routing techniques don’t
work for k-ary d-cubes
– only for k-ary d-arrays (bi-directional)
• Idea: add channels!
– provide multiple “virtual channels” to break the dependence
cycle
Input
Output
– good for BW too!
Ports
Ports
Cross-Bar
– Do not need to add links, or xbar, only buffer resources
• This adds nodes the the CDG, remove edges?
11/6/2015
CS258 S99
12
Breaking deadlock with virtual
channels
Packet switches
from lo to hi channel
11/6/2015
CS258 S99
13
Up*-Down* routing
• Given any bidirectional network
• Construct a spanning tree
• Number of the nodes increasing from leaves to
roots
• UP increase node numbers
• Any Source -> Dest by UP*-DOWN* route
– up edges, single turn, down edges
• Performance?
– Some numberings and routes much better than others
– interacts with topology in strange ways
11/6/2015
CS258 S99
14
Turn Restrictions in X,Y
+Y
-X
+X
-Y
• XY routing forbids 4 of 8 turns and leaves no
room for adaptive routing
• Can you allow more turns and still be deadlock
free
11/6/2015
CS258 S99
15
Minimal turn restrictions in 2D
+y
+x
-x
West-first
north-last
11/6/2015
-y
CS258 S99
negative first
16
Example legal west-first routes
• Can route around failures or congestion
• Can combine turn restrictions with virtual
channels
11/6/2015
CS258 S99
17
Adaptive Routing
• R: C x N x S -> C
• Essential for fault tolerance
– at least multipath
• Can improve utilization of the network
• Simple deterministic algorithms easily run into
bad permutations
• fully/partially adaptive, minimal/non-minimal
• can introduce complexity or anomolies
• little adaptation goes a long way!
11/6/2015
CS258 S99
18
Switch Design
Input
Ports
Receiver
Input
Buffer
Output
Buffer Transmiter
Output
Ports
Cross-bar
Control
Routing, Scheduling
11/6/2015
CS258 S99
19
How do you build a crossbar
Io
Io
I1
I2
I3
O0
I1
Oi
O2
I2
O3
I3
phase
RAM
addr
Din Dout
11/6/2015
CS258 S99
Io
O0
I1
I2
I3
Oi
O2
O3
20
Input buffered swtich
Input
Ports
Output
Ports
R0
R1
R2
Cross-bar
R3
Scheduling
• Independent routing logic per input
– FSM
• Scheduler logic arbitrates each output
– priority, FIFO, random
• Head-of-line blocking problem
11/6/2015
CS258 S99
21
Output Buffered Switch
Input
Ports
Output
Ports
R0
Output
Ports
R1
Output
Ports
R2
Output
Ports
R3
Control
• How would you build a shared pool?
11/6/2015
CS258 S99
22
Route
control
64
In
Arb
Route
control
Deserializer
CRC
check
Out
Arb
RAM
64x128
°
°
°
Input Port
Flow
Control
FIFO
8
8
64
Serializer
CRC
check
Ouput Port
Central
Queue
°
°
64 ° 8
8
8x 8
Crossbar
8
XBar
Arb
8
Flow
Control
FIFO
°
°
°
Ouput Port
Serializer
Input Port
Flow
Control
FIFO
8
8
Deserializer
Example: IBM SP vulcan switch
CRC
Gen
Flow
Control
FIFO
XBar
Arb
8
CRC
Gen
8
• Many gigabit ethernet switches use similar
design without the cut-through
11/6/2015
CS258 S99
23
Output scheduling
R0
Input
Buffers
O0
R1
R2
Cross-bar
Output
Ports
O1
O2
R3
• n independent arbitration problems?
– static priority, random, round-robin
• simplifications due to routing algorithm?
• general case is max bipartite matching
11/6/2015
CS258 S99
24
Stacked Dimension Switches
• Dimension order on
3D cube?
• Cube connected
cycles?
Host In
Zin
2x2
Zout
Yin
2x2
Yout
Xin
2x2
Xout
Host Out
11/6/2015
CS258 S99
25
Flow Control
• What do you do when push comes to shove?
–
–
–
–
ethernet: collision detection and retry after delay
FDDI, token ring: arbitration token
TCP/WAN: buffer, drop, adjust rate
any solution must adjust to output rate
• Link-level flow control
Ready
Data
11/6/2015
CS258 S99
26
Examples
• Short Links
F/E
Ready/Ack
F/E
Source
Destination
Req
Data
• long links
– several flits on the wire
11/6/2015
CS258 S99
27
Smoothing the flow
Incoming Phits
Flow-control Symbols
Full
Stop
High
Mark
Go
Low
Mark
Empty
Outgoing Phits
• How much slack do you need to maximize
bandwidth?
11/6/2015
CS258 S99
28
Link vs global flow control
• Hot Spots
• Global communication operations
• Natural parallel program dependences
11/6/2015
CS258 S99
29
Example: T3D
R ead Req
- no cach e
- cache
- pr efetch
- fetch &in c
R ead Resp
Read Resp
- cached
Ro ute Tag
Dest PE
Co mm and
Ro ute Tag
Dest PE
Co mm and
Wor d 0
Rou te Tag
Dest PE
Com m and
Wor d 0
Wor d 1
Wor d 2
Wor d 3
Ad dr 0
Ad dr 1
Src PE
Packet T ype
3
Wr ite Req
- Proc
- BLT 1
- fetch &inc
req/resp coom and
1
Rou te Tag
Dest PE
Com m an d
Add r 0
Add r 1
Sr c PE
Wor d 0
Write Req
- proc 4
- BLT 4
Route T ag
Dest PE
Com mand
Addr 0
Addr 1
Sr c PE
Word 0
Word 1
Word 2
Word 3
Write Resp
R oute T ag
D est PE
C om mand
B LT R ead Req
R oute Tag
D est PE
C omm and
A ddr 0
A ddr 1
Src PE
A ddr 0
A ddr 1
8
• 3D bidirectional torus, dimension order (NIC selected),
virtual cut-through, packet sw.
• 16 bit x 150 MHz, short, wide, synch.
• rotating priority per output
• logically separate request/response
• 3 independent, stacked switches
• 8 16-bit flits on each of 4 VC in each directions
11/6/2015
CS258 S99
30
Example: SP
16-node Rack
Multi-rack Configuration
Inter-Rack External Switch Ports
E0E1E2E3
E15
Switch
Board
P0P1P2P3
Intra-Rack Host Ports
P15
• 8-port switch, 40 MB/s per link, 8-bit phit, 16-bit flit, single
40 MHz clock
• packet sw, cut-through, no virtual channel, source-based
routing
• variable packet <= 255 bytes, 31 byte fifo per input, 7 bytes
per output, 16 phit links
• 128 8-byte ‘chunks’ in central queue, LRU per output
11/6/2015
CS258 S99
31
• run in shadow mode
Summary
• Routing Algorithms restrict the set of routes
within the topology
– simple mechanism selects turn at each hop
– arithmetic, selection, lookup
• Deadlock-free if channel dependence graph is
acyclic
– limit turns to eliminate dependences
– add separate channel resources to break dependences
– combination of topology, algorithm, and switch design
• Deterministic vs adaptive routing
• Switch design issues
– input/output/pooled buffering, routing logic, selection logic
• Flow control
• Real networks are a ‘package’
of design choices 32
11/6/2015
CS258 S99