Networks: Routing and Design CS 258, Spring 99 David E. Culler Computer Science Division U.C.
Download ReportTranscript Networks: Routing and Design CS 258, Spring 99 David E. Culler Computer Science Division U.C.
Networks: Routing and Design CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Outline • • • • Routing Switch Design Flow Control Case Studies 11/6/2015 CS258 S99 2 Routing • Recall: routing algorithm determines – which of the possible paths are used as routes – how the route is determined – R: N x N -> C, which at each switch maps the destination node nd to the next channel on the route • Issues: – Routing mechanism » arithmetic » source-based port select » table driven » general computation – Properties of the routes – Deadlock feee 11/6/2015 CS258 S99 3 Routing Mechanism • need to select output port for each input packet – in a few cycles • Simple arithmetic in regular topologies – ex: Dx, Dy routing in a grid » west (-x) Dx < 0 » east (+x) Dx > 0 » south (-y) Dx = 0, Dy < 0 » north (+y) Dx = 0, Dy > 0 » processor Dx = 0, Dy = 0 • Reduce relative address of each dimension in order – Dimension-order routing in k-ary d-cubes – e-cube routing in n-cube 11/6/2015 CS258 S99 4 Routing Mechanism (cont) P3 P2 P1 P0 • Source-based – – – – message header carries series of port selects used and stripped en route CRC? Packet Format? CS-2, Myrinet, MIT Artic • Table-driven – message header carried index for next port at next switch » o = R[i] – table also gives index for following hop » o, I’ = R[i ] – ATM, HPPI 11/6/2015 CS258 S99 5 Properties of Routing Algorithms • Deterministic – route determined by (source, dest), not intermediate state (i.e. traffic) • Adaptive – route influenced by traffic along the way • Minimal – only selects shortest paths • Deadlock free – no traffic pattern can lead to a situation where no packets mover forward 11/6/2015 CS258 S99 6 Deadlock Freedom • How can it arise? – necessary conditions: » shared resource » incrementally allocated » non-preemptible – think of a channel as a shared resource that is acquired incrementally » source buffer then dest. buffer » channels along a route • How do you avoid it? – constrain how channel resources are allocated – ex: dimension order • How do you prove that a routing algorithm is deadlock free 11/6/2015 CS258 S99 7 Proof Technique • resources are logically associated with channels • messages introduce dependences between resources as they move forward • need to articulate the possible dependences that can arise between channels • show that there are no cycles in Channel Dependence Graph – find a numbering of channel resources such that every legal route follows a monotonic sequence • => no traffic pattern can lead to deadlock • network need not be acyclic, on channel dependence graph 11/6/2015 CS258 S99 8 Example: k-ary 2D array • Thm: x,y routing is deadlock free • Numbering – – – – +x channel (i,y) -> (i+1,y) gets i similarly for -x with 0 as most positive edge +y channel (x,j) -> (x,j+1) gets N+j similary for -y channels • any routing sequence: x direction, turn, y direction is increasing 1 2 00 01 2 17 18 10 17 1 03 0 11 12 13 21 22 23 31 32 33 19 16 30 CS258 S99 02 18 20 11/6/2015 3 9 Channel Dependence Graph 1 00 2 01 2 17 18 10 17 1 2 3 02 1 18 17 0 11 12 13 30 21 22 23 31 32 17 18 0 1 2 33 18 17 2 3 1 0 17 18 17 18 2 3 1 0 16 19 1 2 CS258 S99 18 17 17 18 16 19 11/6/2015 1 18 17 1 2 19 16 3 03 18 20 2 16 19 2 1 16 19 3 0 10 More examples: • Why is the obvious routing on X deadlock free? – butterfly? – tree? – fat tree? • Any assumptions about routing mechanism? amount of buffering? • What about wormhole routing on a ring? 2 1 0 3 7 4 5 11/6/2015 CS258 S99 6 11 Deadlock free wormhole networks? • Basic dimension order routing techniques don’t work for k-ary d-cubes – only for k-ary d-arrays (bi-directional) • Idea: add channels! – provide multiple “virtual channels” to break the dependence cycle Input Output – good for BW too! Ports Ports Cross-Bar – Do not need to add links, or xbar, only buffer resources • This adds nodes the the CDG, remove edges? 11/6/2015 CS258 S99 12 Breaking deadlock with virtual channels Packet switches from lo to hi channel 11/6/2015 CS258 S99 13 Up*-Down* routing • Given any bidirectional network • Construct a spanning tree • Number of the nodes increasing from leaves to roots • UP increase node numbers • Any Source -> Dest by UP*-DOWN* route – up edges, single turn, down edges • Performance? – Some numberings and routes much better than others – interacts with topology in strange ways 11/6/2015 CS258 S99 14 Turn Restrictions in X,Y +Y -X +X -Y • XY routing forbids 4 of 8 turns and leaves no room for adaptive routing • Can you allow more turns and still be deadlock free 11/6/2015 CS258 S99 15 Minimal turn restrictions in 2D +y +x -x West-first north-last 11/6/2015 -y CS258 S99 negative first 16 Example legal west-first routes • Can route around failures or congestion • Can combine turn restrictions with virtual channels 11/6/2015 CS258 S99 17 Adaptive Routing • R: C x N x S -> C • Essential for fault tolerance – at least multipath • Can improve utilization of the network • Simple deterministic algorithms easily run into bad permutations • fully/partially adaptive, minimal/non-minimal • can introduce complexity or anomolies • little adaptation goes a long way! 11/6/2015 CS258 S99 18 Switch Design Input Ports Receiver Input Buffer Output Buffer Transmiter Output Ports Cross-bar Control Routing, Scheduling 11/6/2015 CS258 S99 19 How do you build a crossbar Io Io I1 I2 I3 O0 I1 Oi O2 I2 O3 I3 phase RAM addr Din Dout 11/6/2015 CS258 S99 Io O0 I1 I2 I3 Oi O2 O3 20 Input buffered swtich Input Ports Output Ports R0 R1 R2 Cross-bar R3 Scheduling • Independent routing logic per input – FSM • Scheduler logic arbitrates each output – priority, FIFO, random • Head-of-line blocking problem 11/6/2015 CS258 S99 21 Output Buffered Switch Input Ports Output Ports R0 Output Ports R1 Output Ports R2 Output Ports R3 Control • How would you build a shared pool? 11/6/2015 CS258 S99 22 Route control 64 In Arb Route control Deserializer CRC check Out Arb RAM 64x128 ° ° ° Input Port Flow Control FIFO 8 8 64 Serializer CRC check Ouput Port Central Queue ° ° 64 ° 8 8 8x 8 Crossbar 8 XBar Arb 8 Flow Control FIFO ° ° ° Ouput Port Serializer Input Port Flow Control FIFO 8 8 Deserializer Example: IBM SP vulcan switch CRC Gen Flow Control FIFO XBar Arb 8 CRC Gen 8 • Many gigabit ethernet switches use similar design without the cut-through 11/6/2015 CS258 S99 23 Output scheduling R0 Input Buffers O0 R1 R2 Cross-bar Output Ports O1 O2 R3 • n independent arbitration problems? – static priority, random, round-robin • simplifications due to routing algorithm? • general case is max bipartite matching 11/6/2015 CS258 S99 24 Stacked Dimension Switches • Dimension order on 3D cube? • Cube connected cycles? Host In Zin 2x2 Zout Yin 2x2 Yout Xin 2x2 Xout Host Out 11/6/2015 CS258 S99 25 Flow Control • What do you do when push comes to shove? – – – – ethernet: collision detection and retry after delay FDDI, token ring: arbitration token TCP/WAN: buffer, drop, adjust rate any solution must adjust to output rate • Link-level flow control Ready Data 11/6/2015 CS258 S99 26 Examples • Short Links F/E Ready/Ack F/E Source Destination Req Data • long links – several flits on the wire 11/6/2015 CS258 S99 27 Smoothing the flow Incoming Phits Flow-control Symbols Full Stop High Mark Go Low Mark Empty Outgoing Phits • How much slack do you need to maximize bandwidth? 11/6/2015 CS258 S99 28 Link vs global flow control • Hot Spots • Global communication operations • Natural parallel program dependences 11/6/2015 CS258 S99 29 Example: T3D R ead Req - no cach e - cache - pr efetch - fetch &in c R ead Resp Read Resp - cached Ro ute Tag Dest PE Co mm and Ro ute Tag Dest PE Co mm and Wor d 0 Rou te Tag Dest PE Com m and Wor d 0 Wor d 1 Wor d 2 Wor d 3 Ad dr 0 Ad dr 1 Src PE Packet T ype 3 Wr ite Req - Proc - BLT 1 - fetch &inc req/resp coom and 1 Rou te Tag Dest PE Com m an d Add r 0 Add r 1 Sr c PE Wor d 0 Write Req - proc 4 - BLT 4 Route T ag Dest PE Com mand Addr 0 Addr 1 Sr c PE Word 0 Word 1 Word 2 Word 3 Write Resp R oute T ag D est PE C om mand B LT R ead Req R oute Tag D est PE C omm and A ddr 0 A ddr 1 Src PE A ddr 0 A ddr 1 8 • 3D bidirectional torus, dimension order (NIC selected), virtual cut-through, packet sw. • 16 bit x 150 MHz, short, wide, synch. • rotating priority per output • logically separate request/response • 3 independent, stacked switches • 8 16-bit flits on each of 4 VC in each directions 11/6/2015 CS258 S99 30 Example: SP 16-node Rack Multi-rack Configuration Inter-Rack External Switch Ports E0E1E2E3 E15 Switch Board P0P1P2P3 Intra-Rack Host Ports P15 • 8-port switch, 40 MB/s per link, 8-bit phit, 16-bit flit, single 40 MHz clock • packet sw, cut-through, no virtual channel, source-based routing • variable packet <= 255 bytes, 31 byte fifo per input, 7 bytes per output, 16 phit links • 128 8-byte ‘chunks’ in central queue, LRU per output 11/6/2015 CS258 S99 31 • run in shadow mode Summary • Routing Algorithms restrict the set of routes within the topology – simple mechanism selects turn at each hop – arithmetic, selection, lookup • Deadlock-free if channel dependence graph is acyclic – limit turns to eliminate dependences – add separate channel resources to break dependences – combination of topology, algorithm, and switch design • Deterministic vs adaptive routing • Switch design issues – input/output/pooled buffering, routing logic, selection logic • Flow control • Real networks are a ‘package’ of design choices 32 11/6/2015 CS258 S99