Transcript pptx

A Unified Model for Timing Speculation:
Evaluating the Impact of Technology Scaling, CMOS
Design Style, and Fault Recovery Mechanism
Marc de Kruijf
Shuou Nomura
Karu Sankaralingam
UW-Madison Computer Sciences Vertical Research Group
© 2010
From Hard to Harder
10000nm
720nm
4000um
360nm
1500um
Hard
180nm
90nm
45nm & beyond
Harder
DSN 2010 - 2
What is the Problem?
 Non-ideal transistor scaling




Transistor wear-out
Process, voltage, and temperature (PVT) variations
Errors due to particle interference
Noise coupling & crosstalk
DSN 2010 - 3
What is the Problem?
Dynamic
verification
NEED HIGH-LEVEL ANALYSIS TOOLS
Performance Toolbox
Reliability Toolbox
DSN 2010 - 4
Our Contribution
A model for timing speculation
• Unifies hardware + system
• Small set of high-level inputs
Also….
processor
designer
Q. What is the impact of technology scaling?
A. Further benefits are small to none.
Q. What is the impact of CMOS design style?
A. Very low power designs benefit most.
Q. What is the impact of the fault recovery mechanism?
A. Fine-grained recovery is key to high efficiencies.
DSN 2010 - 5
Outline
 Timing Speculation
 Model Overview
 Hardware Efficiency Model
 System Recovery Model
 Results
 Conclusion
DSN 2010 - 6
Timing Speculation
…
clock
clock period
( = 1/frequency )
detect &
recover
circuit delay
variations
Timing failure!
…
slower clock
OK!
DSN 2010 - 7
Outline
 Timing Speculation
 Model Overview
 Hardware Efficiency Model
 System Recovery Model
 Results
 Conclusion
DSN 2010 - 8
Model Overview
Error rate
1.
2.
3.
4.
Energy
Time
Energy
Hardware Efficiency System Recovery
Overall Efficiency
Error rateError rate
Model Inputs
A hardware path delay distribution
Effect of variations on path delay as N(μ,σ)
The time between recovery checkpoints
The time to restore a checkpoint
DSN 2010 - 9
Hardware Efficiency Model
Error prob.
Energy
# Paths
Error prob.
Input 1: Path delay distribution
Input 2: Path delay variation (σ)
e.g.
frequency
scaling
…
Clock period
Energy
Path delay
Clock period
Clock period
Error rate
Error prob.
…
Error prob.
DSN 2010 - 10
System Recovery Model
(applies to all backward error recovery systems)
overhead(rate) = failures(rate) x ( waste(rate) + restore )
Time
System Recovery Model Inputs
1. The time between recovery checkpoints (cycles)
2. The time to restore a checkpoint (restore)
Error rate
DSN 2010 - 11
Outline
 Timing Speculation
 Model Overview
 Hardware Efficiency Model
 System Recovery Model
 Results
 Conclusion
DSN 2010 - 12
Results
Is the model useful?
What can we learn?
Technology
Node
11nm
45nm
CMOS Design Style
Recovery
System
High Performance CMOS
Low Power CMOS
Razor
Reunion
Ultra-low Power CMOS
Paceline
DSN 2010 - 13
Results
Error rate
Overall Efficiency
Energy
Time
Energy
Hardware Efficiency System Recovery
Error rate
DSN 2010 - 14
Error rate
Hardware Model Inputs
1. Path delay distribution


Application: H.264 decoding
Hardware: OpenRISC processor
2. Effect of process variations as N(μ,σ) using ITRS data

High Performance CMOS



Low Power CMOS



45nm σ = 0.046μ
11nm σ = 0.051μ
45nm σ = 0.029μ
11nm σ = 0.042μ
Ultra-low Power CMOS

45nm σ = 0.196μ
DSN 2010 - 15
Energy
EDP
Hardware Efficiency
Energy = Power x Time
EDP
= Power x Time2
Normalized
EDP
Error rate
Results for
High Performance CMOS
Error rate
DSN 2010 - 16
Recovery Model Inputs
1. The time between recovery checkpoints &
2. The time to restore a checkpoint

Razor



Reunion



Latch-level detection + pipeline rollback
1 cycle checkpoint size & 5 cycle recovery cost
DMR detection + checkpoint
100 cycle checkpoint size & 100 cycle recovery cost
Paceline


DMR detection + checkpoint + flush
100 cycle checkpoint size & 1000 cycle recovery cost
DSN 2010 - 17
Time
System Recovery
Normalized
Time
Error rate
Error rate
DSN 2010 - 18
EDP
Overall Efficiency
Error rate
1. High Performance CMOS
2. Low Power CMOS
3. Ultra-low Power CMOS
DSN 2010 - 19
Overall Efficiency
Normalized
EDP
High Performance CMOS
Error rate
DSN 2010 - 20
Overall Efficiency
Normalized
EDP
Low Power CMOS
Error rate
DSN 2010 - 21
Overall Efficiency
Normalized
EDP
Ultra-low Power CMOS
Error rate
DSN 2010 - 22
Outline
 Timing Speculation
 Model Overview
 Hardware Efficiency Model
 System Recovery Model
 Results
 Conclusion
DSN 2010 - 23
Conclusions
 A High-level Model
 Results
 Efficiency gains improve only minimally with scaling
 Ultra-low power (sub-threshold) CMOS benefits most
 Fine-grained recovery is key
 Future Work
 Incorporate more sources of variation
 A tool for processor designers?
 Under development at http://www.cs.wisc.edu/vertical
DSN 2010 - 24
Questions?
DSN 2010 - 25
DSN 2010 - ‹#›
Timing Speculation
Source of Timing Variation
Manufacturing Runtime Application
Process
Speed Binning
Online Timing Analysis
Timing Speculation
Figure adapted from Greskamp et al., Paceline: [...]. In PACT ’07.
DSN 2010 - 27
System Recovery Model
System Recovery Model Inputs
1. The time between recovery checkpoints (cycles)
2. The time to restore a checkpoint (restore)
expected # failures expected # cycles
before success executed upon failure
DSN 2010 - 28
Overall Inputs
1.
Path delay distribution


2.
Effect of process variations on path delay as N(μ,σ) using ITRS data



3.
4.
Application: H.264 decoding
Hardware: OpenRISC processor
High Performance CMOS @45nm σ = 0.046μ
Low Power CMOS
@45nm σ = 0.029μ
Ultra-low Power CMOS @45nm σ = 0.196μ
The time between recovery checkpoints &
The time to restore a checkpoint



Razor – Latch-level detection + pipeline rollback (1 & 5 cycles)
Reunion – DMR detection + checkpoint
(100 & 100 cycles)
Paceline – DMR detection + checkpoint + flush (100 & 1000 cycles)
DSN 2010 - 29