Dynamic Interconnection Networks CEG 4131 Computer Architecture III Miodrag Bolic Quiz 1 • NIOS II processor – basics • FPGA – basics • Caches – – – – – – Performance Size, number.

Download Report

Transcript Dynamic Interconnection Networks CEG 4131 Computer Architecture III Miodrag Bolic Quiz 1 • NIOS II processor – basics • FPGA – basics • Caches – – – – – – Performance Size, number.

Dynamic Interconnection Networks

CEG 4131 Computer Architecture III Miodrag Bolic 1

Quiz 1

• NIOS II processor – basics • FPGA – basics • Caches – Performance – Size, number of bits – – – –

Block placement Block identification Block replacement Write strategy

2

Quiz 1 (Cont.)

Key terms: • Flynn’s taxonomy • Shared memory architectures – Cache coherence – NUMA, UMA, COMA – Symmetric Multiprocessors • Distributed memory systems • • Classification based on communication • Classification based on type of parallelism

Chapter 1 from the textbook

3

Quiz 1 (Cont.)

• Amdahl law • Speedup, Efficiency • Parallelism profile, average parallelism, MIPS • Scalability • Understanding of performance of the program for parallel addition •

Chapters 3.1, 3.2.2, 3.4, 3.5

4

Overview

• Network properties • Switches • Single and multistage Interconnection networks • Crossbar 5

Network properties

Node degree d -

the number of edges incident on a node.

– In degree – Out degree •

Diameter

D of a network is the maximum shortest path between any two nodes.

• The network is

symmetric

if it looks the same from any node.

• The network is

scalable

if it expandable with scalable performance when the machine resources are increased.

6

Bisection width

Bisection width

is the minimum number of wires that must be cut to divide the network into two equal halves. Small bisection width -> low bandwidth A large bisection width -> a lot of extra wires • • •

A cut of a network C(N1,N2) is a set of channels that partition the set of all nodes into two disjoint sets N1 and N2. Each element of C(N1,N2) is a channel with a source in N1 and destination in N2 or vice versa. A bisection of a network is a cut that partitions the entire network nearly in half, such that |N2|≤|N1|≤|N2+1|. Here |N2| means the number of nodes that belong to the partition N2.

The channel bisection of a network is the minimum channel count over all bisections of the network: Bc

bi

min sec

tions

|

C

(

N

1 ,

N

2 ) | 7

Factors Affecting Performance

• Functionality – how the network supports data routing, interrupt handling, synchronization, request/message combining, and coherence • Network latency – worst-case time for a unit message to be transferred • Bandwidth – maximum data rate • Hardware complexity – implementation costs for wire, logic, switches, connectors, etc.

8

2 × 2 Switches

*From Advanced Computer Architectures, K. Hwang, 1993 .

9

Module size 2 × 2 4 × 4 8 × 8 N × N

Switches

Legitimate states 4 256 16,777,216 N N Permutation connection 2 24 40,320 N!

• Permutation function: each input can only be connected a single output.

• Legitimate state: Each input can be connected to multiple outputs, but each output can only be connected to a single input 10

Single-stage networks

• • • • Single stage Shuffle-Exchange IN (left) Perfect shuffle mapping function (right) Perfect shuffle operation: cyclic shift 1 place left, eg 101 --> 011 Exchange operation: invert least significant bit, e.g. 101 --> 100 *From Ben Macey at http://www.ee.uwa.edu.au/~maceyb/aca319-2003 11

Multistage Interconnection Networks

• • • • • • The capability of single stage networks are limited but if we cascade enough of them together, they form a completely connected MIN (Multistage Interconnection Network).

Switches can perform their own routing or can be controlled by a central router This type of networks can be classified into the following four categories:

Nonblocking

– A network is called strictly nonblocking if it can connect any idle input to any idle output regardless of what other connections are currently in process

Rearrangeable nonblocking

– In this case a network should be able to establish all possible connections between inputs and outputs by rearranging its existing connections.

Blocking interconnection

– A network is said to be blocking if it can perform many, but not all, possible connections between terminals.

– Example: the Omega network 12

Omega networks

• A multi-stage IN using 2 × 2 switch boxes and a perfect shuffle interconnect pattern between the stages • In the Omega MIN there is one unique path from each input to each output. • No redundant paths → no fault tolerance and the possibility of blocking Example: • Connect input 101 to output 001 • Use the bits of the destination address, 001, for dynamically selecting a path • Routing: - 0 means use upper output - 1 means use lower output 13 *From Ben Macey at http://www.ee.uwa.edu.au/~maceyb/aca319-2003

Omega networks

• log 2 N stages of 2 × 2 switches • N/2 switches per stage • S=(N/2) log 2 (N) switches • Number of permutations in a omega network 2 S 14

Baseline networks

• The network can be generated recursively • The first stage N × N, the second (N/2) × (N/2) • Networks are topologically equivalent if one network can be easily reproduced from the other networks by simply rearranging nodes at each stage.

*From Advanced Computer Architectures, K. Hwang, 1993.

15

Crossbar Network

• Each junction is a switching component – connecting the row to the column.

• Can only have one connection in each column *From Advanced Computer Architectures, K. Hwang, 1993.

16

Crossbar Network

• The major advantage of the cross-bar switch is its potential for speed. • In one clock, a connection can be made between source and destination. • The diameter of the cross-bar is one.

• Blocking if the destination is in use • Because of its complexity, the cost of the cross-bar switch can become the dominant factor for a large multiprocessor system.

• Crossbars can be used to implement the costs are kept down.

a ×b

switches used in MIN’s. In this case each crossbar is small so 17

Problem

A) Use two-input AND and OR gates to construct

NxN

crossbar switch network between

N

processors and

N

memory modules. Use

c ij

the switch in

i th

row and

j th

signal as the enable signal for column. Let the width of each crosspoint be

w

bits.

B) Estimate the total number of AND and OR gates needed as a function of

N

and

w

.

18

P1 C11 Crosspoint

Problem (cont.)

M2 M1 ...

C1n C12 Mn P2 C21 Pn Cn1 C22 Cn2 C2n Cnn 19

Problem (cont.)

M1 P1 C11 Crosspoint C12 M2 ...

Mn C1n P2 C21 Pn Cn1 C22 Cn2 C2n Cnn C11 Crosspoint M1 P1 20

C11 C12 1

Problem (cont.)

P1 Address Decoder P2 Address 1 Decoder 2 2 C21 C22 21

Network

Performance Comparison

Latency Switching complexity Wiring complexity Blocking Bus Constant O(N) O(1) O(w) yes MIN Crossbar O(log 2 N) O(Nlog 2 N) O(Nw log 2 N) yes O(1) O(N 2 ) O(N 2 w) no 22

Some Commercial Solutions [3]

• System-on-chip crossbar networks: – Nexus from Fulcrum Microsystems • The core is used in PMC-Sierra dual MIPS processor RM9000 23

References

1. Advanced Computer Architecture and Parallel Processing

, by Hesham El-Rewini and Mostafa Abd-El Barr, John Wiley and Sons, 2005.

2. Advanced Computer Architecture Parallelism, Scalability, Programmabilit

y, by K. Hwang, McGraw-Hill 1993. 3.

A. Lines, “Nexus: an asynchronous crossbar interconnect for synchronous system-on chip designs”, Proc. of High Performance Interconnects, pp 2-7, 2003.

24