Many-core Computing: A Disruptive Technology Enabling Low-cost, Low-power Desktop Supercomputing David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA Systor Keynote May 24, 2010

Transcript Many-core Computing: A Disruptive Technology Enabling Low-cost, Low-power Desktop Supercomputing David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA Systor Keynote May 24, 2010

Many-core Computing: A Disruptive
Technology Enabling Low-cost, Low-power
Desktop Supercomputing
David Kaeli
Department of Electrical and Computer Engineering
Northeastern University
Boston, MA
Systor Keynote May 24, 2010
Current trends in Many-core
Computing
 The CPU industry has elected to jump off the
cycle-time scaling bandwagon
 Power/thermal constraints have become a limiting
factor
 We now see CPU vendors placing multiple (10’s
of) cores on a single chip
 Clock speeds have not changed
 The memory wall persists and multiple cores that
assume a shared-memory model place further
pressure on this problem
 Software vendors are looking for new
parallelization technology
 Multi-core aware operating systems
 Semi-automatic parallelizing compilers
Systor Keynote May 24, 2010
Current trends in Many-core
Computing
 There has been a renewed interest in parallel
computing paradigms and languages
 Existing many-core architectures are being
considered for general-purpose platforms (e.g., Cell,
GPUs, DSPs)
 Heterogeneous systems are becoming a common
theme
 Are we returning to the days of the X87 co-processor?
 How can we combined multi-core and many-core
systems into a single design?
Systor Keynote May 24, 2010
Graphics Processors
 Graphics Processing Units
 More than 64% of Americans played a video game in 2009
 High-end - primarily used for 3-D rendering for videogame
graphics and movie animation
 Mid/low-end – primarily used for computer displays
 Manufacturers include NVIDIA, AMD/ATI, IBM-Cell
 Very competitive commodities market
Systor Keynote May 24, 2010
GPU Performance
 GPUs provide a path for performance growth
 Cost and power usage numbers are also impressive
Near exponential
growth
in performance
for GPUS!!
Source:NVIDIA 2009
Systor Keynote May 24, 2010
Comparison of CPU and GPU
Hardware Architectures
CPU: Cache heavy,
focused on individual
thread performance
GPU: ALU heavy,
massively parallel,
throughput-oriented
Systor Keynote May 24, 2010
CPU/GPU Relationship
CPU
(host)
GPU w/
local DRAM
(device)
Systor Keynote May 24, 2010
A wide range of GPU apps
3D image analysis
Adaptive radiation therapy
Acoustics
Astronomy
Audio
Automobile vision
Bioinfomatics
Biological simulation
Broadcast
Cellular automata
Fluid dynamics
Computer vision
Cryptography
CT reconstruction
Data mining
Digital cinema / projections
Electromagnetic simulation
Equity training
Film
Protein folding
Financial
Quantum chemistry
Languages
Ray tracing
GIS
Radar
Holographics cinema
Reservoir simulation
Machine learning
Robotic vision / AI
Mathematics research
Robotic surgery
Military
Satellite data
analysis
Mine planning
Seismic imaging
Molecular dynamics
Surgery simulation
MRI reconstruction
Surveillance
Multispectral imaging
Ultrasound
N-body simulation
Video conferencing
Network processing
Telescope
Neural network
Oceanographic research Video
Visualization
Optical inspection
Wireless
Particle physics
X-Ray
Systor Keynote May 24, 2010
GPU as a General Purpose
Computing Platform
Speedups are impressive and ever increasing!
Genetic Algorithm
2600 X
Real Time Elimination
Lattice-Boltzmann Method Total Variation Modeling
of Undersampling Artifacts for Numerical Fluid Mechanics
2300 X
1840 X
1000 X
Fast Total Variation for
Computer Vision
1000 X
Monte Carlo Simulation
Of Photon Migration
1000 X
Stochastic Differential
Equations
675 X
K-Nearest Neighbor
Search
470 X
Source: CUDA Zone at www.nvidia.com/cuda/
Systor Keynote May 24, 2010
GPGPU is becoming
mainstream research
Research activities are expanding significantly
Search result for keyword “GPGPU” in IEEE and ACM
Systor Keynote May 24, 2010
Streaming Processor Array
Grid of thread blocks
TPC
TPC
TPC
TPC
TPC
TPC
TPC
TPC
TPC
TPC
Multiple thread blocks,
many warps of threads
Texture Processor
Cluster
SM
Streaming Multiprocessor
SP
SP
• 240 shader cores
SP
SP
• 1.4B transistors
• Up to 2GB onboard
memory
• ~150GB/sec BW
• 1.06 SP GFLOPS
• CUDA and OpenCL
support
• Programmable
memory spaces
• Tesla S1070
provides 4 GPUs in a
1U unit
Texture Unit
SFU
SFU
SP
SP
SP
SP
SM
SM
NVIDIA GT200
architecture
Individual threads
Systor Keynote May 24, 2010
AMD/ATI Radeon HD 5870
• Codename “Evergreen”
• 1600 SIMD cores
• L1/L2 memory
architecture
• 153GB/sec memory
bandwidth
• 2.72 TFLOPS SP
• OpenCL and DirectX11
• Hidden memory
microarchitecure
• Provides for vectorized
operation
Systor Keynote May 24, 2010
Comparison of CPU and GPU
Hardware Architectures
CPU/GPU
Single
precision
TFLOPs
Cores
GFLOPs/W
att
$/GFLOP
NVIDIA 285
1.06
240
5.8
$3.12
NVIDIA 295
1.79
480
6.2
$3.80
AMD HD 5870
2.72
1600
14.5
$0.16
AMD HD 4890
1.36
800
7.2
$0.18
Intel I-7 965
0.051
4
0.39
$11.02
Source: NVIDIA, AMD and Intel
Systor Keynote May 24, 2010
How to exploit GPUs
 Identify hot kernels
 Eliminate kernels that have inherent data sharing (e.g., loop
carried dependence)
 Identify the right grain of parallelism (i.e., the loop nest) to
expose parallelism
 Generate thousands of independent threads…
 Identify appropriate GPU memory subsystems for storage
of data used by kernel…
 Consider additional performance optimizations
 Warp occupancy - NVIDIA
 Vectorization – AMD
 Register usage - both
Systor Keynote May 24, 2010
AMD vs. NVIDIA
AMD
NVIDIA
Hardware architecture
Vector
Scalar
Programming
language
Brook+, IL, OpenCL
CUDA, OpenCL
Programming model
SIMD vector
SIMT
Thread hierarchy
Single level
Two level
Memory exposure
Uniform space
Multiple space
Source of horsepower
Vectorization and
multiple output
Pros
Easier programming
Memory spaces utilization
including shared memory
More flexible
programming
Challenges
Harnessing the potential horsepower
Systor Keynote May 24, 2010
Vector Addition Example (CPU)
void vecAdd(float *A, float *B, float *C, int N) {
for(int i = 0; i < N; i++)
C[i] = A[i] + B[i];
}
Computational
kernel
int main() {
int N
float
float
float
= 4096;
*A = (float *)malloc(sizeof(float)*N);
*B = (float *)malloc(sizeof(float)*N);
*C = (float *)malloc(sizeof(float)*N)
Allocate memory
init(A); init(B);
Initialize memory
vecAdd(A, B, C, N);
Call kernel
free(A); free(B); free(C);
}
Systor Keynote May 24, 2010
Deallocate memory
Vector Addition Example (GPU)
• Map multi-dimensional data structures to sequential GPU threads
__global__
void vecAdd(float *A, float *B, float *C) {
int tid = blockIdx.x * blockDim.x + threadIdx.x
GPU Computational
kernel
C[tid] = A[tid] + B[tid];
}
GRID
(0,0)
(0,0) (1,0) (2,0)
BLOCK
G
... (31,0)
(0,0) (1,0) (2,0)
(1,0)
blockDim.x = 32
Systor Keynote May 24, 2010
BLOCK
... (31,0)
…….
Vector Addition Example (GPU)
int main() {
}
int N
float
float
float
= 4096;
*A = (float *)malloc(sizeof(float)*N);
*B = (float *)malloc(sizeof(float)*N);
*C = (float *)malloc(sizeof(float)*N)
Allocate memory on
GPU
init(A); init(B);
float *d_A, *d_B, *d_C;
cudaMalloc(&d_A, sizeof(float)*N);
cudaMalloc(&d_B, sizeof(float)*N);
cudaMalloc(&d_C, sizeof(float)*N);
cudaMemcpy(d_A, A, sizeof(float)*N, HtoD);
cudaMemcpy(d_B, B, sizeof(float)*N, HtoD);
dim3 blocks(N/32,1);
dim3 threads(32,1);
gpuVecAdd <<< blocks,threads >>> (d_A, d_B, d_C);
cudaMemcpy(C, d_C, sizeof(float)*N, DtoH);
cudaFree(d_A);
cudaFree(d_B);
cudaFree(d_C);
free(A);
free(B);
free(C);
Initialize memory on
GPU
Configure threads
Run kernel (on GPU)
Copy results back to
CPU
Deallocate memory
on GPU
Deallocate memory on
GPU
Systor Keynote May 24, 2010
Experiences with migrating
applications to a GPU
 3-D Cardiac CT Imaging
 Iterative Least Squares Back
Projection
 3-D Breast Cancer Screening
 Maximum Likelihood Estimation
 Intrusion Detection Systems
 K-Nearest Neighbor Outlier
Detection
 Physics-based Simulation
 Surgical simulation
Systor Keynote May 24, 2010
NUIC
Technologies
Trends in Medical Imaging
 The Medical Imaging field is rapidly deploying new 3-D and
4-D imaging technologies to improve patient outcomes
 This move has created an avalanche of image data
 Image reconstruction and image analysis have become major bottlenecks
 Accurate 3-D and 4-D image reconstruction requires compute-intensive
algorithms
 The use of multi-modality imaging (e.g., CT and Ultrasound) further
exacerbates this problem
 Many-core computing can play a large role in addressing
these challenges
Systor Keynote May 24, 2010
Developing a suite of Biomedical Image
Reconstruction Libraries – CUDA/OpenCL
 Target applications:
 Deformable registration - radiation oncology
 3-D Iterative reconstruction – cardiovascular imaging
 Maximum likelihood estimation – Digital
Breast Tomosynthesis
 Motion compensation in PET/CT images cardiovascular imaging
 Hyperspectral imaging – skin cancer
screening
 Image segmentation – brain imaging
 $1.3M NSF Award EEC-0946463
Systor Keynote May 24, 2010
Developing a suite of Biomedical Image
Reconstruction Libraries – CUDA/OpenCL
 Target applications:
 Deformable registration - radiation oncology
 3-D Iterative reconstruction – cardiovascular imaging
 Maximum likelihood estimation – Digital
Breast Tomosynthesis
 Motion compensation in PET/CT images cardiovascular imaging
 Hyperspectral imaging – skin cancer
screening
 Image segmentation – brain imaging
 $1.3M NSF Award EEC-0946463
Systor Keynote May 24, 2010
State-of-the-art in Cardiac Imaging
 Currently, coronary heart disease (CHD) is the single
leading cause of death in America
 Health care costs related to CHD >$150B/year
 U.S. in 2006 (American Heart Association)
 Approximately 1,255,000 coronary attacks
 Approximately 425,425 deaths
 Invasive coronary angiography is the state-of-the-art
for assessing coronary blockages
 Inject dye into the bloodstream and then Xray the heart
 8% complication rate
 0.2% mortality rate
Systor Keynote May 24, 2010
3-D Cardiovascular Plaque Imaging
 3D CT imaging can be used to identify vulnerable plaque
 A helical scan of the body is performed
 Provides for more accurate imaging of the cardio-vascular system
 Produces a detailed 3-D view of the blockage
 Possesses few negative side effects
 Scanning geometry produces a tremendous amount of data
to process
Image reconstruction can take
days to generate a single view!!
Systor Keynote May 24, 2010
Impacting heart disease with GPUs
 A single 8800 GTX (128 cores) speedup versus multi-threaded
dual-core Intel CPU execution – 20.3x forward / 17.8x backward
 A series of optimizations applied, includes utilizing multiple GPUs –
71.3x forward / 137x backward
Forward projection
Backward projection
* Collaboration with Synho Do (MGH), Clem Karl (BU) and Homer Pien (MGH)
3-D Cardiovascular Plaque Imaging
 3-D Spiral Cone-Beam Cardiac Image
Reconstruction
 Reconstruction performance is a barrier to improve image
quality
 A single NVIDIA 285 GPU card ($300) can speedup
up 3-D reconstruction performance by 700X
 Processing time reduced from 10’s of hours to
seconds
 Supercomputing performance in a single GPU card
Systor Keynote May 24, 2010
Digital Breast Tomosynthesis (DBT)
 A new technology developed at MGH to:
 Produce a 3-D image of the breast utilizing 15 or more 2-D
projections
 3-D imagery can help address the following issues related
to 3-D mammography
 Increase the correct detection rate of cancers
 Reduce the rate of misdiagnosed cancers – avoid unneeded
biopsies
2-D
DBT
2-D
DBT
Cancer
Increase correct detection rate
Hammartoma
Decrease false positive rate
Tomosynthesis Image
Reconstruction
X-ray source
(15 views)
X-ray
projections
Set 3D volume (guess)
Compute projections
Forward
• Utilizes a limited angle
tomography approach using
many 2-D images to generate a
3-D image
Correct 3D volume
Backward
• Performs an iterative
Maximum Likelihood Estimation
for 3-D image reconstruction
3D volume
• Reconstruction time is a
barrier to image-guided biopsy
(1196x2304x45)
Detector
(1196x2304)
Systor Keynote May 24, 2010
Reconstruction Computing Performance
What can a GPU buy you?
Systor Keynote May 24, 2010
Experiences with migrating
applications to a GPU
 3-D Cardiac CT Imaging
 Iterative Least Squares Back
Projection
 3-D Breast Cancer Screening
 Maximum Likelihood Estimation
 Intrusion Detection Systems
 K-Nearest Neighbor Outlier
Detection
 Physics-based Simulation
 Surgical simulation
Systor Keynote May 24, 2010
NUIC
Technologies
VGUARD - Intrusion Detection on
Virtualized Servers
Targeted platforms
 Server application appliances
 Embedded systems
App
Two-tiered architecture

Front-end subsystem
 Profiling integrated in a
virtualization layer
 Monitors run-time environment
below the OS

App
Guest OS1
(Linux/Win)
App
Guest OS2
(Linux/Win)
VMM Layer
Execution profiling
Hardware
Back-end subsystem
Protection
embedded in
virtualization
layer
 Utilizes machine learning and
pattern classification algorithms
 Identifies potential malicious
behavior through modeling
normal behavior
NUIC
Technologies
Systor Keynote May 24, 2010
Intrusion Detection on
Virtualized Servers
• VGUARD Utilizes virtualization to obtain feature-rich
execution profiles
Spam
MySQL Exchange Assassin
Windows
Ubuntu
Apache
RedHat
300+ real
world
malwares
tested
VMWare ESX + VGUARD
Dell PowerEdge Server
VGUARD
NUIC
Technologies
Systor Keynote May 24, 2010
Intrusion Detection on
Virtualized Servers
Alarm
Anomaly detection
with LOF method
Data sets
Streaming Data
Feature Construction
NUIC
Technologies
Systor Keynote May 24, 2010
Local Outlier Factor Uses Density –
computes multiple K-nearest neighbors
C1
C2
p2
p1
Systor Keynote May 24, 2010
NUIC
Technologies
Intrusion Detection on Virtualized Servers
Detects >95% of all
malwares with a low false
positive rate
NUIC
Technologies
Systor Keynote May 24, 2010
Intrusion Detection on Virtualized
Servers
 Moving KNN/LOF to a GPU makes this
algorithm feasible do deploy in practice
Systor Keynote May 24, 2010
NUIC
Technologies
Physics Simulation on a GPU
Systor Keynote May 24, 2010
GPUs in Surgical Simulation
 Goal:
 Provide highly accurate physics
simulation in near real time
 Target operations:
 Surgical training for Army doctors
 Modeling deformation of soft tissues
and cutting
• Improves parallel
performance within a dynamic
environment designed for
physics-modeling
• Solves iterative systems
expressed in terms of Physics
and not matrix operations
Systor Keynote May 24, 2010
Architecture Aware Data Structures
 Bridging the gap between the convenience of “Physics
Derived” data structures and “Architecture Aware” data
structures
 Abstract architecture-specific requirements like
alignment and data layouts within a intermediate object
layer
 Allow rapid development and code reuse across
algorithms
 Minimal changes to pre-existing GPU kernel
 Our data structures are slid in from underneath
 Applied to a range of algorithms for surgical simulation
for tissue deformation studies
 QuasiStatic Simulation
 Backward Euler Systems
 Multigrid Simulation
Systor Keynote May 24, 2010
PhysBAM Data Structure Performance
8.32x
4.31x
4.67x
Systor Keynote May 24, 2010
6.95x
GPU Strengths
 Supercomputing on the desktop
 Easy to program (small learning curve)
 Already have demonstrated success with
several complex biomedical imaging and
machine learning applications
 Impressive speedups competitive to Beowulf
clusters
 Excellent cost/performance
 CUDA allows us to read and write data at
any location in the device memory
 Memory close to the processors (registers +
shared memory)
Systor Keynote May 24, 2010
GPU Limitations
 Porting applications to the latest-and-greatest
hardware becomes a time-consuming task
 Suggests we need to raise the abstraction level
 Some hardwired graphic components are hidden
 Performance optimization requires deep knowledge of
the microarchitecture
 Better tools are needed




Register usage
Memory blocking and layout
Aggressive threading schemes
Multi-GPU exploration
 What do researchers want??
 Semi-automatic tuning
Systor Keynote May 24, 2010
GPU Optimizations
 Vectorization
 Memory space selection and memory
transformation
 Multi-GPU exploitation
 GPU Virtualization
 PTX Optimization
Systor Keynote May 24, 2010
GPU Vectorization


Loop Vectorization – targeting the vector architecture provided
for on the AMD Firestream platform
Targets linearizing data to improve the number of loops that
can vectorized on AMD GPUs
A[0:N] [0:M]
B[0:N] [0:M]
for (i1=0;ii1<N;i1++)
For (i2=0;i2<=M;i2++)
A[i1][i2]=B[i1][M-i2]+1;
Transform
Array B
A[0:N] [0:M]
B[0:N] [M:0]
for (i1=0;ii1<N;i1++)
For (i2=0;i2<=M;i2++)
A[i1][i2]=B[i1][i2]+1;
Systor Keynote May 24, 2010
Current GPU Optimizations


Loop Vectorization – targeting the vector architecture provided
for on the AMD Firestream platform
Targets linearizing data to improve the number of loops that
can vectorized on AMD GPUs
A[0:N] [0:M]
B[0:N] [0:M]
for (i1=0;ii1<N;i1++)
For (i2=0;i2<=M;i2++)
A[i1][i2]=B[i1][M-i2]+1;
Obtained up to
11X speedup
over scalar code for
Livermore Loops
Transform
Array B
A[0:N] [0:M]
B[0:N] [M:0]
for (i1=0;ii1<N;i1++)
For (i2=0;i2<=M;i2++)
A[i1][i2]=B[i1][i2]+1;
*”Data transformations enabling loop vectorization on multithreaded data parallel architectures.” Jang et al.,
PPOPP 2010.
Memory Access Pattern Modeling
 Classification of memory access patterns
Best memory access pattern for GPUs
Classification of memory access patterns and the mathematical representation in the model.
Gray color: elements being accessed, C: constant number, Z: random number
Systor Keynote May 24, 2010
Current GPU Optimizations





Memory Selection and Coalescing on NVIDIA GPUs
Multiple memory spaces are exposed to the program on NVIDIA
GPUs – a remnant of graphics
Memory
Location
Cached
Access
Scope
Global
Off Chip
No
R/W
Thread Grid
Constant
Off Chip
Yes
R
Thread Grid
Texture
Off Chip
Yes
R
Thread Grid
Local
Off Chip
No
R/W
Thread
Shared
On Chip
N/A
R/W
Thread Block
Register
On Chip
N/A
R/W
Thread
Mathematical framework developed that characterizes loop-based
array iteration spaces
Applied mapping framework to Parboil and PhysBAM programs
Speedups ranged from 1.3X to 15X speedup versus a baseline GPU
implementation
Systor Keynote May 24, 2010
Multi-GPU Design Space
Exploration



Predict performance for GPU programs while
scaling either the number of GPUs or the input
data size
Select the optimal configuration of GPUs
(distributed/multi-system or sharedmemory/multi-processing, and how many)
without having to purchase hardware
Avoid architecture-specific optimizations which
limit scalability and portability to future
generations of hardware
Systor Keynote May 24, 2010
Requirements for
Performance Prediction
System-specific Inputs

Network bandwidth

PCIe bandwidth to GPU

Disk throughput

RAM size
Algorithm-specific Inputs

Communication requirements

Reference (single-GPU)
Model
implementation
Variables

Number of GPUs

Data set sizes

GPU Configurations
Systor Keynote May 24, 2010
Predicted
execution
times
Multi-GPU Modeling
 Developed quantitative GPU models to capture
various aspects of multi-GPU execution and
communication
 Allows for accurate performance prediction of
”traditional” CUDA applications across multiple
GPUs
 Average difference between modeled and actual
is 11%
 Able to account for changes in number of GPUs and
changes in data set sizes
 Future work explores modeling larger numbers of
GPUs
 Ongoing work with UIUC/NCSA and 3Leaf Systems
 D. Schaa and D. Kaeli, “Exploring the Multiple-GPU Design
Space,'‘ IEEE International Parallel and Distributed Processing
Symposium, Best Paper Award, May 2009.
Systor Keynote May 24, 2010
GPU Virtualization
 GPUs on many heterogeneous systems can be
accessed from a remote system over the network
server
user’s machine
virtual GPUs
workstation
network
cluster
Systor Keynote May 24, 2010
Exploit Virtualization to
Access Remote GPUs
 Each system presents this
abstraction of GPUs
 Multiple virtual machines
can be present per system
 Manager module
 Determines availability
 Schedules execution
 Communicates data and
commands to other
manager units
 Any number of virtual GPUs
can be connected to a
system
 No change to GPU
programming model (no
distributed coding)
 Access to remote (diverse)
resources
Virtual Machine
Guest OS
PCIe interface
CUDA App
CUDA Driver
Coordinates
with other
manager modules
Hypervisor
PCIe interface
 No local GPUs required!
Systor Keynote May 24, 2010
Manager
Physical GPUs
network
Improving the Open64 Backend
for GPUs
 Goal: Reduce register pressure in PTX using
instruction scheduling and rematerialization
 Benefits:
 More blocks per core (higher occupancy)
 More threads per block
 Fewer register spills (to local memory)
 Challenges:
 PTX is only an intermediate representation
 Little to no visibility into the NVIDIA backend
device driver
Systor Keynote May 24, 2010
Improving the Open64 Backend
for GPUs
correlation = 0.94
 Target system: NVIDIA GeForce 8800 GTX
 SDK: CUDA Toolkit 2.3
 Target applications: CUDA SDK 2.3 examples, CUBLAS, PhysX
Systor Keynote May 24, 2010
Improving the Open64 Backend for GPUs
Systor Keynote May 24, 2010
Improving the Open64 Backend for GPUs
Systor Keynote May 24, 2010
OpenCL – The future for many-core
computing

Open Compute Language



LLVM compiler
A framework for writing programs that execute on
heterogeneous systems
Presently runs on NVIDIA GPUs and AMD multi-core
CPUs/GPUs, and on the Apple Snow Leopard OS

Being developed by Khronos Group – a non-profit

Modeled as four parts
•
Platform Model
•
Execution Model
•
Memory Model
•
Programming Model
Systor Keynote May 24, 2010
Fermi and Fusion –
The next steps in GPGPU


Fermi (4/15/10)
 512 CUDA cores
 8X the current double
precision FP
performance
 16 Concurrent kernels
 ECC support
 Power issues delaying
the system
Fusion
 CPU/GPU Integration
 Details later this year
Systor Keynote May 24, 2010
Summary and Future Work


GPUs are revolutionizing desktop
supercomputing
A number of critical applications have been
migrated successfully
 We will see shortly if heterogeneous CPU/GPU
systems will be adopted as the status quo for the
desktop market
 The key will be power/performance/cost
 GPUs have already demonstrated their value in selected domains
 OpenCL is the future for heterogeneous computing
 The low-end and the high-end are meeting in the
middle!
Systor Keynote May 24, 2010
For more info on GPGPUs
 IEEE Transactions on Parallel and Distributed
System special issues on Hardware
Accelerators – focused on GPUs
 Proceedings for GPGPU 1, 2, and 3
 Published in JPDC and ACM digital library
 Also check out: http://www.gpgpu.org
Systor Keynote May 24, 2010

Many-core Computing: A Disruptive Technology Enabling Low-cost, Low-power Desktop Supercomputing David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA Systor Keynote May 24, 2010

Transcript Many-core Computing: A Disruptive Technology Enabling Low-cost, Low-power Desktop Supercomputing David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA Systor Keynote May 24, 2010

Directory