iGPU - Computer Sciences Dept. - University of Wisconsin–Madison

Download Report

Transcript iGPU - Computer Sciences Dept. - University of Wisconsin–Madison

Department of Computer Science
iGPU: Exception Support and Speculative
Execution on GPUs
Jaikrishnan Menon, Marc de Kruijf
Karthikeyan Sankaralingam
Vertical Research Group
University of Wisconsin−Madison
Presented at ISCA 2012
1
Department of Computer Science
Executive Summary

Compiler/hardware co-design for efficient, generalpurpose GPUs

Exception support with 1.5% overhead
(no more than 4%)


Demand paging
Context switch
support with 2.5% overhead
(no more than 4%)
Exploiting speculation provides > 10% energy savings
2
Department of Computer Science
Outline


Motivation and Background
iGPU Mechanisms




iGPU Architecture




General exception handling
Context switching
Speculation support
Software
Hardware
Evaluation
Conclusion
3
Department of Computer Science
CPU Evolution Retrospective

IBM 360 era – precise exceptions as a performance
tradeoff

However, two key shifts in processor design –


Virtual memory no longer optional
Speculative execution on ILP processors
4
Department of Computer Science
Precise exception handling and speculation was
a key enabler for modern CPUs
5
Department of Computer Science
GPU Architectural trends
A single unified CPU-GPU address space


Significant interest in supporting demand paging
Emerging necessity for supporting speculation


More workloads – “irregular” workloads
Handling reliability problems
6
Department of Computer Science
Need general purpose exception and
speculation support for GPUs
7
Department of Computer Science
Why not just borrow CPU ideas?

CPUs use buffering to preserve arch. state


Future file, History file, Re-order Buffer …
But GPUs have 1000x as many registers

Not practical!
8
Department of Computer Science
Fundamental Challenges
1. Well defined restart point in program

GPU pipeline and SIMT model make this hard
2. Preserving architecture state prior to restart

Need to save 1000s of registers
9
Department of Computer Science
Key Ideas of our Solution
1. Well defined restart point in program

Idempotent code regions

Restartable regions
producing same effect
Creation of
restart points
2. Preserving architecture state prior to restart
Regions constructed with
small live state: 1 to 3 regs
 Save only this live state

Preservation of
necessary state
10
Department of Computer Science
Outline


Challenges and Implications
iGPU Mechanisms




iGPU Architecture




General exception handling
Context switching
Speculation support
Software
Hardware
Evaluation
Conclusion
11
Department of Computer Science
Exception Support



Idempotent regions mark restart points
Register file provides all the reqd. state!
Idempotence guarantees correctness
Creation
idea
Exception handler
A
B
B
Implicit checkpoints using idempotence
12
Department of Computer Science
Outline


Challenges and Implications
iGPU Mechanisms




iGPU Architecture



General exception handling
Context switching
Speculation support
Software
Hardware
Evaluation
13
Department of Computer Science
Context Switch
Exception is page fault
B
A
?


?
B
Page-fault handling
1.
2.
3.
4.
5.
Cleanly remove process 1
Start another process and execute
Get page from disk concurrently
Restore process 1
Restart process 1
?


?

14
Department of Computer Science
Context Switch
Exception is page fault
B
A
?


?
B
Page-fault handling
1.
2.
3.
4.
5.
Cleanly remove process 1
Start another process and execute
Get page from disk concurrently
Restore process 1
Restart process 1
?


?

15
Department of Computer Science
Context Switch


Must save and restore architectural state
But...GPUs have megabytes of register state
Save only live state

Save only live state at points of minimal live state
Department of Computer Science
Context Switch


Must save and restore architecture state
But...GPUs have megabytes of register state
Preserve
Save only live state

Save state at points of minimal live state
Candidate cut point
A
B
22
B
4
9
Exception handler
B
23
live registers
# live #
registers
Implicit minimum live state checkpoints using
idempotence
idea
Department of Computer Science
Outline


Challenges and Implications
iGPU Mechanisms




iGPU Architecture




General exception handling
Context switching
Speculation support
Software
Hardware
Evaluation
Conclusion
18
Department of Computer Science
Speculation

Speculation generates state that is wrong
Need even more buffers
 Recall: buffers are impractical for GPUs



Tuning the
Creation idea
Use idempotence!
Reduce re-execution cost by sub-dividing
regions
Implicit checkpoints with low re-execution
overhead using idempotence
19
Department of Computer Science
Speculation
Misspeculation
A
B
B1
B2
BB2
C
C
# live registers: 2
* Region construction details: Idempotent Processing, PLDI ‘12
20
Department of Computer Science
Outline


Motivation and Background
iGPU Mechanisms




iGPU Architecture




General exception handling
Context switching
Speculation support
Software
Hardware
Evaluation
Conclusion
21
Department of Computer Science
iGPU Architecture
Application
Compiler
Hardware
22
Department of Computer Science
iGPU Architecture - Software


Form regions
Preserve state
Creation idea
Preserve idea
region marker
instructions
register reassignment,
moves and
spills
region formation
Reg. pressure
state preservation
23
Department of Computer Science
iGPU Architecture - Software
Kernel Source Code
Source Code Compiler
Device Code Generator
Device Code
24
Department of Computer Science
iGPU Architecture - Software
Kernel Source Code
Source Code Compiler
Device Code Generator
Region
formation
Idempotent Device Code
25
Department of Computer Science
iGPU Architecture - Software
Kernel Source Code
Source Code Compiler
Device Code Generator
Region
formation
State
preservation
Idempotent Device Code
26
Department of Computer Science
iGPU Architecture - Hardware
(not to scale)
…
L1 cache
& TLB
Creation idea
SIMD
Processor
L2 Cache
General Purpose
Registers
Core
…
Core
Core
…
Core
RPCs
Fetch Unit
Decode
27
Department of Computer Science
iGPU Architecture - Hardware
(to scale)
Restart PC
Register
General Purpose Registers
2 RPCs per warp one each for Sparse
and Short regions
Compare to 1024
GPRs per warp
(32 x 32)
28
Department of Computer Science
iGPU Architecture - Hardware
Preserve idea
State preservation handled purely by compiler!
Not hardware’s responsibility
29
Department of Computer Science
Outline


Motivation and Background
iGPU Mechanisms




iGPU Architecture




General exception handling
Context switching
Speculation support
Software
Hardware
Evaluation
Conclusion
30
Department of Computer Science
Evaluation
iGPU Runtime Overhead
Region Creation
Context Switch and Speculation support overhead
4.5
4
% Overhead
3.5
3
2.5
2
1.5
1
0.5
0
31
Department of Computer Science
Evaluation – Voltage Speculation
Energy Savings on iGPU with Voltage Emergency Prediction
20
18
Vdd reduction : 10%
Error rate : 0.01%
% energy savings
16
14
12
10
8
6
4
2
0
32
Department of Computer Science
Outline


Motivation and Background
iGPU Mechanisms




iGPU Architecture




General exception handling
Context switching
Speculation support
Software
Hardware
Evaluation
Conclusion
33
Department of Computer Science
Executive Summary

Compiler/hardware co-design for efficient, generalpurpose GPUs

Exception support with 1.5% overhead
(no more than 4%)


Demand paging
Context switch
support with 2.5% overhead
(no more than 4%)
Exploiting speculation provides > 10% energy savings
34
Department of Computer Science
Conclusions

Exception support for GPUs is practical

Enables better integration with CPUs in CPU-GPU
architectures

Speculative execution on GPUs


Both for performance and reliability
presents interesting possibilities in the context of
“irregular” workloads
35
Department of Computer Science
Questions
36