Rumba: An Online Quality Management System for Approximate Computing Daya S Khudia, Babak Zamirai, Mehrzad Samadi and Scott Mahlke June 17, 2015 Computer Engineering Laboratory University.

Download Report

Transcript Rumba: An Online Quality Management System for Approximate Computing Daya S Khudia, Babak Zamirai, Mehrzad Samadi and Scott Mahlke June 17, 2015 Computer Engineering Laboratory University.

Rumba: An Online Quality Management
System for Approximate Computing
Daya S Khudia, Babak Zamirai,
Mehrzad Samadi and Scott Mahlke
June 17, 2015
Computer Engineering Laboratory
University of Michigan
Approximate Computing
10x
Design Point
Pareto Frontier
Energy
Speedup and
Energy
Reduction
10%
Performance
[Esmaeilzadeh, CACM’15]
90%
Accuracy
Soft Applications
Image Processing
Data Analytics
100% accuracy
…
notBut
always
required
Media Applications
Computer Vision
Robotics
Quality is Important
100% accuracy
90% accuracy
80% accuracy
• Building acceptable systems out of inexact
hardware/software components
Approximation Challenge - 1
• Large errors in output elements are critical
High-Error Elements
80%
No Errors
Low-Error
Elements
Percentage of Output Elements
100%
100% of the pixels
have 10% error
10% of the pixels
have 100% error
Requirement: Large errors should be detected and
10% errors canError
small
be ignored
100%
Approximation Challenge - 2
• Output quality is input dependent
Output Error (%)
25
20
15
10
5
0
0
200
400
Image
600
Requirement: Online detection of errors
800
Approximation Challenge - 3
• Measuring output quality is expensive
– Usually by sampling over time
I am here

Quality
Target + delta
Target
Target - delta
Time
Check the quality
Requirement: Inexpensive continuous monitoring
Rumba: Solution Overview
CPU
Recover
Rumba
Robust
Approximate
Results
Application
and Inputs
Results
Approximate
Accelerator Detect
• It solves approximation challenges by being
- Lightweight (detection and recovery)
- Continuous and online
- Tunable with quality feedback
Inexact Hardware: Neural Processing Unit
Program
Offline
Find approximable code
• Train neural network
• Find suitable number of layers
and neurons in each layer
• Get neural network weights
Runtime
Execution
Transfer
accelerator
configuration
Inputs/Outputs
Inexact but 2.3x faster
and 3x energy efficient
[Esmaeilzadeh, Micro’12]
Neural Accelerator
Rumba: Design
Input Queue
CPU
Output Queue
Rumba
Recovery
Output Merger
Config Queue
Approximate
Accelerator
User
Specification
Rumba Detection
Online Tuner
Recovery Queue
• Detection Contributions
– Output based methods
• Calculate errors by observing
approximate output
– Input based methods
• Predict errors based on current inputs
Output Based Methods
• Ex: Gamma correction on images
• Pixels are read row-wise
– Exhibit temporal similarity
– Drastic changes only at the edges
and endpoints
Useful output
recompute
yes
Inputs
Approximate
On Accelerator
Temporally
similar?
no
Discard
Accelerator
output
Output Based Method: Using Anomalies
Compare with the history of previously computed elements
False positive
history
Time
• Detection system exploits temporal
similarity in output elements
-
Many applications exhibit temporal similarity
Lightweight Detection
• Maintain a moving average
• Compare accelerator output with the moving
avg
Exponential Moving Average (EMA)
EMA = ( e * a ) + ( PreviousEMA * (1- a ))
α = Smoothing factor
e = Current element
Limitation: What if no temporal similarity?
Solution: Input-based methods
Observation: Input Dependent Error
Inverse Kinematics
1.2
Increasing Error
1.0
Specific inputs
show errors
input2
0.8
0.6
0.4
0.2
0.0
-0.2
-0.5
0.0
0.5
input1
1.0
Input Based Methods
Approximate
on accelerator
Useful output
recompute
no
Inputs
Predict error
High
Error?
yes
Discard
accelerator
output
• Error prediction need not be very accurate
– Since we just want to know high error or not
• Prediction should be low-cost
– So that gains of approximation are not nullified
Error Prediction: Decision Tree
X[0] <= .01
X[1] <= .15
Error = .19
Error = .40
X[0] <= .50
Error = .01
Error = .21
• Cost is dependent on levels
– Higher levels => more accuracy + higher cost
• Best configuration by limited exhaustive search
– Tree depth limited to a max of 7 levels
Training Error Predictors
Set of train1
inputs/outputs
Train NPU
NPU model
Set of train2
inputs
Exact model
Output (approx)
Calculate
Errors
Output (exact)
Train Error
Predictors
Error Predictor
Models
Recovery: Re-computation
• Recovery by re-computation
– Possible due to data parallel computations
• CPU re-computes elements with large errors
– Accelerator and the CPU work in tandem
CPU
Input Queue
Output Queue
Rumba
Recovery
Output
Recovery Queue
Approximate
Accelerator
Rumba
Detection
Experimental Setup/Benchmarks
• Neural Processing Unit (NPU) accelerator
– Using PyBrain (Artificial Neural Network) library
• Energy modeling
– GEM5 + McPAT for applications + Cacti
Application
Domain
Error metric
blackscholes
Financial analysis
Mean Relative Error
fft
Signal processing
Mean Relative Error
inversek2j
Robotics
Mean Relative Error
jmeint
3D gaming
# of Mismatches
jpeg
Compression
Mean Pixel Difference
kmeans
Machine learning
Mean Output
Difference
sobel
Image processing
Mean Pixel Difference
Quality Vs. Recomputations (inversek2j)
Ideal
14
Uniform
EMA
LinErrors
TreeErrors
Unchecked accelerator error
Average Normalized Error (%)
12
10
8
6
4
2
0
0
15% 20
62%
40
60
Recomputed Elements (%)
80
100
Performance Improvements
NPU
Ideal
Uniform
EMA
linearErrors
treeErrors
Error Reduction
6
5
3
4
2
3
2
1
1
0
0
blackscholes
fft
inversek2j
jmeint
jpeg
kmeans
sobel
geomean
Quality monitoring have no performance
overhead (avg case) while reducing errors
Speedup wrt cpu
Error reduction wrt no checks
4
Energy Savings
Uniform
5
EMA
linearErrors
treeErrors
11
12
Ideal
24
12
Relative to CPU Energy Reduction
NPU
4
3
2
1
0
blackscholes
fft
inversek2j
jmeint
jpeg
kmeans
sobel
geomean
Energy savings are determined by number
of fixes
False Positives
• Detected error that is actually not a large error
40
Uniform
EMA
linearErrors
treeErrors
False Positives (%)
35
30
25
20
15
10
5
0
blackscholes
fft
inversek2j
jmeint
jpeg
kmeans
sobel
Low false positives  no unnecessary recoveries
geomean
Conclusion
Quality is
Important
•
•
Detect Errors
Re-execute
Robust
Results
Output-based methods are inexpensive
Input-based methods are broadly applicable
Rumba
•
Rumba provides quality control in approximate computing and
reduces average output error by 2x
Rumba
Rumba: An Online Quality Management
System for Approximate Computing
Daya S Khudia, Babak Zamirai, Mehrzad
Samadi and Scott Mahlke
June 17, 2015
Computer Engineering Laboratory
University of Michigan