iGPU - Computer Sciences Dept. - University of Wisconsin–Madison
Download
Report
Transcript iGPU - Computer Sciences Dept. - University of Wisconsin–Madison
Department of Computer Science
iGPU: Exception Support and Speculative
Execution on GPUs
Jaikrishnan Menon, Marc de Kruijf
Karthikeyan Sankaralingam
Vertical Research Group
University of Wisconsin−Madison
Presented at ISCA 2012
1
Department of Computer Science
Executive Summary
Compiler/hardware co-design for efficient, generalpurpose GPUs
Exception support with 1.5% overhead
(no more than 4%)
Demand paging
Context switch
support with 2.5% overhead
(no more than 4%)
Exploiting speculation provides > 10% energy savings
2
Department of Computer Science
Outline
Motivation and Background
iGPU Mechanisms
iGPU Architecture
General exception handling
Context switching
Speculation support
Software
Hardware
Evaluation
Conclusion
3
Department of Computer Science
CPU Evolution Retrospective
IBM 360 era – precise exceptions as a performance
tradeoff
However, two key shifts in processor design –
Virtual memory no longer optional
Speculative execution on ILP processors
4
Department of Computer Science
Precise exception handling and speculation was
a key enabler for modern CPUs
5
Department of Computer Science
GPU Architectural trends
A single unified CPU-GPU address space
Significant interest in supporting demand paging
Emerging necessity for supporting speculation
More workloads – “irregular” workloads
Handling reliability problems
6
Department of Computer Science
Need general purpose exception and
speculation support for GPUs
7
Department of Computer Science
Why not just borrow CPU ideas?
CPUs use buffering to preserve arch. state
Future file, History file, Re-order Buffer …
But GPUs have 1000x as many registers
Not practical!
8
Department of Computer Science
Fundamental Challenges
1. Well defined restart point in program
GPU pipeline and SIMT model make this hard
2. Preserving architecture state prior to restart
Need to save 1000s of registers
9
Department of Computer Science
Key Ideas of our Solution
1. Well defined restart point in program
Idempotent code regions
Restartable regions
producing same effect
Creation of
restart points
2. Preserving architecture state prior to restart
Regions constructed with
small live state: 1 to 3 regs
Save only this live state
Preservation of
necessary state
10
Department of Computer Science
Outline
Challenges and Implications
iGPU Mechanisms
iGPU Architecture
General exception handling
Context switching
Speculation support
Software
Hardware
Evaluation
Conclusion
11
Department of Computer Science
Exception Support
Idempotent regions mark restart points
Register file provides all the reqd. state!
Idempotence guarantees correctness
Creation
idea
Exception handler
A
B
B
Implicit checkpoints using idempotence
12
Department of Computer Science
Outline
Challenges and Implications
iGPU Mechanisms
iGPU Architecture
General exception handling
Context switching
Speculation support
Software
Hardware
Evaluation
13
Department of Computer Science
Context Switch
Exception is page fault
B
A
?
?
B
Page-fault handling
1.
2.
3.
4.
5.
Cleanly remove process 1
Start another process and execute
Get page from disk concurrently
Restore process 1
Restart process 1
?
?
14
Department of Computer Science
Context Switch
Exception is page fault
B
A
?
?
B
Page-fault handling
1.
2.
3.
4.
5.
Cleanly remove process 1
Start another process and execute
Get page from disk concurrently
Restore process 1
Restart process 1
?
?
15
Department of Computer Science
Context Switch
Must save and restore architectural state
But...GPUs have megabytes of register state
Save only live state
Save only live state at points of minimal live state
Department of Computer Science
Context Switch
Must save and restore architecture state
But...GPUs have megabytes of register state
Preserve
Save only live state
Save state at points of minimal live state
Candidate cut point
A
B
22
B
4
9
Exception handler
B
23
live registers
# live #
registers
Implicit minimum live state checkpoints using
idempotence
idea
Department of Computer Science
Outline
Challenges and Implications
iGPU Mechanisms
iGPU Architecture
General exception handling
Context switching
Speculation support
Software
Hardware
Evaluation
Conclusion
18
Department of Computer Science
Speculation
Speculation generates state that is wrong
Need even more buffers
Recall: buffers are impractical for GPUs
Tuning the
Creation idea
Use idempotence!
Reduce re-execution cost by sub-dividing
regions
Implicit checkpoints with low re-execution
overhead using idempotence
19
Department of Computer Science
Speculation
Misspeculation
A
B
B1
B2
BB2
C
C
# live registers: 2
* Region construction details: Idempotent Processing, PLDI ‘12
20
Department of Computer Science
Outline
Motivation and Background
iGPU Mechanisms
iGPU Architecture
General exception handling
Context switching
Speculation support
Software
Hardware
Evaluation
Conclusion
21
Department of Computer Science
iGPU Architecture
Application
Compiler
Hardware
22
Department of Computer Science
iGPU Architecture - Software
Form regions
Preserve state
Creation idea
Preserve idea
region marker
instructions
register reassignment,
moves and
spills
region formation
Reg. pressure
state preservation
23
Department of Computer Science
iGPU Architecture - Software
Kernel Source Code
Source Code Compiler
Device Code Generator
Device Code
24
Department of Computer Science
iGPU Architecture - Software
Kernel Source Code
Source Code Compiler
Device Code Generator
Region
formation
Idempotent Device Code
25
Department of Computer Science
iGPU Architecture - Software
Kernel Source Code
Source Code Compiler
Device Code Generator
Region
formation
State
preservation
Idempotent Device Code
26
Department of Computer Science
iGPU Architecture - Hardware
(not to scale)
…
L1 cache
& TLB
Creation idea
SIMD
Processor
L2 Cache
General Purpose
Registers
Core
…
Core
Core
…
Core
RPCs
Fetch Unit
Decode
27
Department of Computer Science
iGPU Architecture - Hardware
(to scale)
Restart PC
Register
General Purpose Registers
2 RPCs per warp one each for Sparse
and Short regions
Compare to 1024
GPRs per warp
(32 x 32)
28
Department of Computer Science
iGPU Architecture - Hardware
Preserve idea
State preservation handled purely by compiler!
Not hardware’s responsibility
29
Department of Computer Science
Outline
Motivation and Background
iGPU Mechanisms
iGPU Architecture
General exception handling
Context switching
Speculation support
Software
Hardware
Evaluation
Conclusion
30
Department of Computer Science
Evaluation
iGPU Runtime Overhead
Region Creation
Context Switch and Speculation support overhead
4.5
4
% Overhead
3.5
3
2.5
2
1.5
1
0.5
0
31
Department of Computer Science
Evaluation – Voltage Speculation
Energy Savings on iGPU with Voltage Emergency Prediction
20
18
Vdd reduction : 10%
Error rate : 0.01%
% energy savings
16
14
12
10
8
6
4
2
0
32
Department of Computer Science
Outline
Motivation and Background
iGPU Mechanisms
iGPU Architecture
General exception handling
Context switching
Speculation support
Software
Hardware
Evaluation
Conclusion
33
Department of Computer Science
Executive Summary
Compiler/hardware co-design for efficient, generalpurpose GPUs
Exception support with 1.5% overhead
(no more than 4%)
Demand paging
Context switch
support with 2.5% overhead
(no more than 4%)
Exploiting speculation provides > 10% energy savings
34
Department of Computer Science
Conclusions
Exception support for GPUs is practical
Enables better integration with CPUs in CPU-GPU
architectures
Speculative execution on GPUs
Both for performance and reliability
presents interesting possibilities in the context of
“irregular” workloads
35
Department of Computer Science
Questions
36