Transcript Document

“I think there is a world
market for maybe five
computers.”
Thomas Watson Senior,
Chairman of IBM, 1943
7/16/2015
ICSS531 - Parallel Architecture
1
Architecture Classification
• SISD
– Single Instruction Single Data
• SIMD
– Single Instruction Multiple Data
• MIMD
– Multiple Instruction Multiple Data
• MISD
– Multiple Instruction Single Data
7/16/2015
ICSS531 - Parallel Architecture
2
Vector Processors
•
•
•
•
The earliest parallel computers
Pipeline design (MISD)
Typically viewed as SIMD
Important machines include
– Cray-1, etc.
– CDC Cyber 205
– IBM 3090 Vector
7/16/2015
ICSS531 - Parallel Architecture
3
Seymour Cray (1925-1996)
• Packaging, including heat
removal
• High level bit plumbing…
getting the bits from I/O,
into memory through a
processor and back to
memory and to I/O
• Parallelism
• Programming: O/S and
compiler
• Problems being solved
7/16/2015
ICSS531 - Parallel Architecture
4
Cray’s Contributions
• Creative and productive during his entire
career 1951-1996.
• Creator and un-disputed designer of supers
from 1960
• Circuits, packaging, and cooling…
• “the mini” as a peripheral computer
• Established the template for vector
supercomputer architecture
7/16/2015
ICSS531 - Parallel Architecture
5
Cray’s Attitudes
• Didn’t go with paging & segmentation
because it slowed computation
• In general, would cut loss and move on
when an approach didn’t work…
• Ignored CMOS and microprocessors until
SRC Company design
• Went against conventional wisdom
7/16/2015
ICSS531 - Parallel Architecture
6
Computers
• CDC 6600 (6xxx Series)
– Employed “peripheral processors”
– Influenced architecture probably more than any
other computer
• Cray 1 (1/M, 1/S, XMP, YMP, C90, T90)
• Cray 2 GaAs… and Cray 3, Cray 4
7/16/2015
ICSS531 - Parallel Architecture
7
Cray XMP/4
7/16/2015
ICSS531 - Parallel Architecture
8
Cray 2
7/16/2015
ICSS531 - Parallel Architecture
9
Vector Processing
• Vector processors have high-level operations that work on
linear arrays of numbers: vectors
7/16/2015
ICSS531 - Parallel Architecture
10
Styles of Vector Architectures
• Memory-memory vector processors
– All vector operations are memory to memory
• Vector-register processors
– All vector operations between vector registers
– Vector equivalent of load-store architecture
– Includes all vector machines since late 1980s
• Cray, Convex, Fujitsu, Hitachi, NEC
7/16/2015
ICSS531 - Parallel Architecture
11
Components of Vector Processor
• Vector Register
– Fixed length bank holding a single vector
• Has at least 2 read and 1 write ports
• Typically 8-32 vector registers, each holding 64-128 64-bit
elements
• Vector Functional Units
– Fully pipelined, start new operation every clock
• Typically 4-8 FUs: FP add, FP mult, FP reciprocal, integer
add, logical, shift
• Scalar Registers
– Single element for FP scalar or address
7/16/2015
ICSS531 - Parallel Architecture
12
Vector-Register Architecture
7/16/2015
ICSS531 - Parallel Architecture
13
Y=a*X+Y
ld
addi
loop: ld
multd
ld
add
sd
addi
addi
sub
bnez
7/16/2015
f0,a
r4,rx,#512
f2,0(rx)
f2,f0,f2
f4,0(ry)
f4,f2,f4
0(ry),f4
rx,rx,#8
ry,ry,#8
r20,r4,rx
r20,loop
ld
lv
multv
lv
addv
sv
ICSS531 - Parallel Architecture
f0,a
v1,rx
v2,f0,v1
v3,ry
v4,v2,v3
ry,r4
14
Y=a*X+Y
ld
addi
loop: ld
multd
ld
add
sd
addi
addi
sub
bnez
7/16/2015
f0,a
r4,rx,#512
f2,0(rx)
f2,f0,f2
f4,0(ry)
f4,f2,f4
0(ry),f4
rx,rx,#8
ry,ry,#8
r20,r4,rx
r20,loop
ld
lv
multv
lv
addv
sv
ICSS531 - Parallel Architecture
f0,a
v1,rx
v2,f0,v1
v3,ry
v4,v2,v3
ry,r4
15
Y=a*X+Y
ld
addi
loop: ld
multd
ld
add
sd
addi
addi
sub
bnez
7/16/2015
f0,a
r4,rx,#512
f2,0(rx)
f2,f0,f2
f4,0(ry)
f4,f2,f4
0(ry),f4
rx,rx,#8
ry,ry,#8
r20,r4,rx
r20,loop
ld
lv
lv
multv
addv
sv
ICSS531 - Parallel Architecture
f0,a
v1,rx
v3,ry
v2,f0,v1
v4,v2,v3
ry,r4
16
CM2
7/16/2015
ICSS531 - Parallel Architecture
17
Basic Organization
Host Computer
Microcontroller
CM Processors
And
Memories
• Host sends commands & data to microcontroller
• Microcontroller broadcasts control signals, data to
array
• Microcontroller collects data from processor array
7/16/2015
ICSS531 - Parallel Architecture
18
CM Processors and Memories
• Processors and memories are 1 bit wide,
memory is bit-addressable
• Operation is bit-serial
• Fields may be any number of bits, start
anywhere
• Context bit (flag) of processor determines
whether processor is active
7/16/2015
ICSS531 - Parallel Architecture
19
Programming Languages
• PARIS - PArallel Instruction Set, similar to
assembly language
• *LISP - Common Lisp extension with
explicit parallel operations
• C* - C extension with explicit parallel data,
implicit parallel operations
• CM-Fortran - Fortran 90 variant
implemented on CM
7/16/2015
ICSS531 - Parallel Architecture
20
CM2
• The heart of the CM2 is the parallel processing
unit
– Consists of up to 64K processors
• Each processors has up to 128KB RAM
• Processors are bit serial!!
–
–
–
–
7/16/2015
An interprocessor communications network
One or more sequencers
An interface to one or more front-end computers
Zero or more I/O controllers and/or framebuffers
ICSS531 - Parallel Architecture
21
CM2 System Organization
Nexus
Connection Machine
Processors
Sequencer
0
Sequencer
1
Connection Machine
Processors
7/16/2015
Front
End
Connection Machine
Processors
Sequencer
3
Sequencer
2
Connection Machine
Processors
ICSS531 - Parallel Architecture
22
Interprocessor Network
• Each node of the network is a cluster (“chip”)
– 16 data processors on the chip
– Memory
– One router node
• The nodes are connected using a 12D hypercube
– 4096 nodes, each directly connected to 11 other nodes
– Thus the maximum size of a CM is 12 times 4096 or
64K processors
7/16/2015
ICSS531 - Parallel Architecture
23
Arith.cs
/* Simple arithmetic demonstration - file arith.cs */
#include <stdio.h>
#define NPROCS 1048576
shape [NPROCS]A;
float:A s, x, y;
void main() {
int k, i;
with ( A ) {
x = (rand()/1.0e7) - 60.0;
y = (rand()/1.0e7) - 60.0;
for ( i = 0; i < 3; i++ ) {
CM_start_timer(1);
with ( A ) for ( k = 0; k < 200; k++ )
CM_stop_timer(1);
CM_reset_timer(); }
s = x * y;
}}}
7/16/2015
ICSS531 - Parallel Architecture
24
CM5
7/16/2015
ICSS531 - Parallel Architecture
25