Single-Chip Heterogeneous Computing Does the Future

Download Report

Transcript Single-Chip Heterogeneous Computing Does the Future

Single-Chip Heterogeneous Computing
Does the Future Include Custom Logics, FPGA, and GPGPUs?
Presented by Kittisak Sajjapongse
Introduction
to
the study
Objective of the study

Observe the trends of integrating
unconventional cores (U-cores) into
single-chip multicores

Identify the factors that impact decision
to have U-cores
Introduction to the study
Model in the study
Symmetric
- Multiple fast complex cores (FastCore)
- Highly optimized to minimize latency of single thread
Asymmetric
- One fast complex core (FastCore)
- Multiple simple cores (BCE)
- Intended to handle application which has parallelism
Heterogeneous
- One fast complex core (FastCore)
- U-cores: ASICs, FPGAs, GPGPUs
- We are going to study about U-cores
Introduction to the study
ASIC, FPGA, and GPGPU

ASIC (Application-Specific Integrated Circuit)
◦ A device or integrated circuit customized for specific application
domains e.g. H264 codec, JPEG codec etc.

FPGA (Field Programmable Gate Array)
◦ A configurable digital integrated circuit capable for supporting hardware
architectures

GPGPU (General-Purpose Graphic Processing Unit)
◦ Graphics devices that provides APIs (Application Programming
Interface) for using with parallelizable application
Introduction to the study
ASIC, FPGA, and GPGPU
Features
ASIC
FPGA
GPGPU
Design/Program
CAD/CAM
EDA (Electronic
Design Automation)
Tool
Hardware Description
Language (HDL)
openCL, CUDA, etc.
Design controls
Transistors, Standard
cells
Logic Components,
RTL
Processors, Cache,
Memory
Flexibility
Fixed-function (1)
Configurable (2)
Programmable (3)
Level of abstraction
Low (1)
Medium (2)
High (3)
Power efficiency
Extremely High (3)
High (2)
Moderate (1)
They all are used to exploit parallelism!!!
Introduction to the study
What is the study about ?

Constains
◦ Power
◦ Bandwidth

Questions posed
Under bandwidth- and power- constrains
◦ Would single-chip multicores benefit significatly from U-cores ?
◦ Would ASICs be the best choice ?
Introduction to the study
Model for U-core
What is BCE?

Baseline Core Equivalent
◦ Referred to a basic processor
◦ Used as baseline reference for performance
and power consumption
Model for U-core
What is BCE?

Two parameters used later
◦ n : number of total BCE available
◦ r : number of resources dedicated to
complex cores (in a unit of BCE)
Model for U-core
Amdahl’s Law
Model for U-core
Reference: http://en.wikipedia.org/wiki/Amdahl_law
Hill & Marty’s extended Amdahl’s Law
Model for U-core
Reference: M. D. Hill et al., “Amdahl’s Law in the Multicore Era,” Computer
How about Heterogeneous arch.?
?
Model for U-core
SpeedupHeterogeneous (??)= ???
Under Power & Bandwidth constrains
Deriving model for U-core
SpeedupAmdahl
= f(f,n)
SpeedupHill&Marty = f(f,n,r)
SpeedupHet.(U-core) = f(f,n,r,B,P,µ,φ)
New Parameters:
B – Memory Bandwidth of U-core (in unit of BCE compulsory bandwidth)
P – Active Power of U-core relative to BCE
µ – Performance of U-core relative to BCE
Φ – Power efficiency of U-core relative to BCE
Model for U-core
Deriving model for U-core
1
Speedup
=
Speedup
Speedup
het(U-core)
asym(offload)
asymmetric =
1-f
perf(r)
Model for U-core
+
f
perf(r)
µ( n +- r )n - r
Obtaining µ,φ for U-core
Devices & Workload
Device:
Device Ref.
Device
BCE
Intel Atom
Symmetric CMP
Intel Core i7-960
ASIC (U-core)
65nm technology (1.1V)
FPGA (U-core)
V6-LX760
GPU (U-core)
GTX285, GTX480, R5870
Workload:
- Dense Matrix Multiplication (MMM)
- Fast Fourier Transform (FFT with various input size 24 to 220)
- Black-Scholes (BS)
Obtaining µ,φ for U-core
Deriving µ for ASIC in FFT-1024
(case study)
350
0.5
Deriving φ for ASIC in FFT-1024
(case study)
100
0.8
Obtained Parameters
Obtaining µ,φ for U-core
Applying the Model for Results
Scaling Projection
Budget and Constrains
Result for FFT-1024
Results for MMM
Results for Black-Scholes
Answering the questions
◦ Would single-chip multicores benefit significatly from U-cores ?
 Yes , If the application has enough (>90%) parallelism to exploit.
◦ Would ASICs be the best choice ?
 Depends on applications, if there is not much parallelism, then ASIC might not be worth
to implement.
Conclusions

Sufficient parallelism must exists to significantly obtain performance
improvement from U-core

Flexible U-cores tend to be competitive to ASIC under limited bandwidth
and limited parallelism

U-core such as ASIC is useful when power is the primary goal