Transcript slides

Vector Processors
Ryan McPherson
ELEC 6200
Fall 2007
ELEC 6200, Fall 07,
Oct 29
McPherson: Vector Processors
1
Overview
•
•
•
•
•
•
History
Description
Advantages
Disadvantages
Applications
Conclusions
ELEC 6200, Fall 07,
Oct 29
McPherson: Vector Processors
2
What is a Vector Processor?
• Also called an Array Processor.
• Runs multiple mathematical operations on multiple data
elements simultaneously.
• Common in supercomputers of the 1970’s 80’s and 90’s.
• Today most CPU designs contains at least some vector
processing instructions, typically referred to as SIMD.
• Typically operate on a few vectors elements per clock
cycle in a pipeline v. SIMD which will operate on all at
once.
ELEC 6200, Fall 07,
Oct 29
McPherson: Vector Processors
3
History
• 1962 University of Illinois Illiac IV - completed 1972 with
64 ALUs 100-150 MFlops (massively parallel computer)
• (1973) TI’s Advance Scientific Computer (ASC) 20-80
MFlops
• (1975) Cray-1 first to have vector registers instead of
keeping data in memory (8 registers with 64 64-bit words
in each)
• Cray-1 had separate pipelines for different instruction
types allowing vector chaining. 80-240 MFlops
ELEC 6200, Fall 07,
Oct 29
McPherson: Vector Processors
4
How It Works
• Typical Vector Processor (Cell Processor)
ELEC 6200, Fall 07,
Oct 29
McPherson: Vector Processors
3
5
Cell Processor
• Result of a partnership between IBM, SCEI/SONY, and TOSHIBA
• Parallelism at all levels
Thread - multicore design (8 processors)
Instruction - statically scheduled and power aware
Data - data parallel instructions
• Contains a data processor instead of a control system
• Statistics:





Observed clock speed: > 4 GHz
Peak performance (single precision): > 256 GFlops
Peak performance (double precision): >26 GFlops
Local storage size per SPU: 256KB
Total number of transistors: 234M
ELEC 6200, Fall 07,
Oct 29
McPherson: Vector Processors
6
How It Works (con’t)
3
•
•
•
•
•
VRF is Dynamic - 128 entry 128b wide (128x1 64x2 32x4 16x8 8x16 1x128)
Stores Scalar and Vector data
Computes all answers, then sorts them to reduce latency.
Accesses memory in blocks.
Operates on low-latency SRAM
ELEC 6200, Fall 07,
Oct 29
McPherson: Vector Processors
7
Advantages
• Each result is independent of previous results - allowing deep pipelines
and high clock rates.
• A single vector instruction performs a great deal of work - meaning
less fetches and ewer branches (and in turn fewer mispredictions).
• Vector instructions access memory a block at a time which allows
memory latency to be amortized over many elements.
• Vector instructions access memory with known patterns, which allows
multiple memory banks to simultaneously supply operands.
• Less memory access = faster processing time.
ELEC 6200, Fall 07,
Oct 29
McPherson: Vector Processors
8
Disadvantages
•
•
•
•
•
Not as fast with scalar instructions
Complexity of the multi-ported VRF
Difficulties implementing precise exceptions
High price of on-chip vector memory systems
Increased code complexity
ELEC 6200, Fall 07,
Oct 29
McPherson: Vector Processors
9
Applications
•
•
•
•
•
•
Servers
Home Cinema
Super Computing
Cluster Computing
Mainframes
“Astrophysicist Replaces Supercomputer With 8 PS3’s” 2
ELEC 6200, Fall 07,
Oct 29
McPherson: Vector Processors
10
References
1.
2.
3.
4.
5.
Overcoming the limitations of conventional vector processors Kozyrakis, C.; Patterson, D.; Computer
Architecture, 2003. Proceedings. 30th Annual International Symposium on 9-11 June 2003 Page(s):399
- 409
Astrophysicist Replaces Supercomputer with Eight PlayStation 3. Wired Magazine October 17, 2007.
A novel SIMD architecture for the Cell heterogeneous chip-multiprocessor. Hot Chips 17 (Aug 15,
2005).
Computer Organization and Design. Patterson, David and Hennessy, John. Chapter 9.11 p. 48-51.
Morgan Kaufmann Publishers. 2005.
©2006 IBM CorporationChip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing
Frontiers 2006
ELEC 6200, Fall 07,
Oct 29
McPherson: Vector Processors
11