Rumba: An Online Quality Management System for Approximate Computing Daya S Khudia, Babak Zamirai, Mehrzad Samadi and Scott Mahlke June 17, 2015 Computer Engineering Laboratory University.
Download ReportTranscript Rumba: An Online Quality Management System for Approximate Computing Daya S Khudia, Babak Zamirai, Mehrzad Samadi and Scott Mahlke June 17, 2015 Computer Engineering Laboratory University.
Rumba: An Online Quality Management System for Approximate Computing Daya S Khudia, Babak Zamirai, Mehrzad Samadi and Scott Mahlke June 17, 2015 Computer Engineering Laboratory University of Michigan Approximate Computing 10x Design Point Pareto Frontier Energy Speedup and Energy Reduction 10% Performance [Esmaeilzadeh, CACM’15] 90% Accuracy Soft Applications Image Processing Data Analytics 100% accuracy … notBut always required Media Applications Computer Vision Robotics Quality is Important 100% accuracy 90% accuracy 80% accuracy • Building acceptable systems out of inexact hardware/software components Approximation Challenge - 1 • Large errors in output elements are critical High-Error Elements 80% No Errors Low-Error Elements Percentage of Output Elements 100% 100% of the pixels have 10% error 10% of the pixels have 100% error Requirement: Large errors should be detected and 10% errors canError small be ignored 100% Approximation Challenge - 2 • Output quality is input dependent Output Error (%) 25 20 15 10 5 0 0 200 400 Image 600 Requirement: Online detection of errors 800 Approximation Challenge - 3 • Measuring output quality is expensive – Usually by sampling over time I am here Quality Target + delta Target Target - delta Time Check the quality Requirement: Inexpensive continuous monitoring Rumba: Solution Overview CPU Recover Rumba Robust Approximate Results Application and Inputs Results Approximate Accelerator Detect • It solves approximation challenges by being - Lightweight (detection and recovery) - Continuous and online - Tunable with quality feedback Inexact Hardware: Neural Processing Unit Program Offline Find approximable code • Train neural network • Find suitable number of layers and neurons in each layer • Get neural network weights Runtime Execution Transfer accelerator configuration Inputs/Outputs Inexact but 2.3x faster and 3x energy efficient [Esmaeilzadeh, Micro’12] Neural Accelerator Rumba: Design Input Queue CPU Output Queue Rumba Recovery Output Merger Config Queue Approximate Accelerator User Specification Rumba Detection Online Tuner Recovery Queue • Detection Contributions – Output based methods • Calculate errors by observing approximate output – Input based methods • Predict errors based on current inputs Output Based Methods • Ex: Gamma correction on images • Pixels are read row-wise – Exhibit temporal similarity – Drastic changes only at the edges and endpoints Useful output recompute yes Inputs Approximate On Accelerator Temporally similar? no Discard Accelerator output Output Based Method: Using Anomalies Compare with the history of previously computed elements False positive history Time • Detection system exploits temporal similarity in output elements - Many applications exhibit temporal similarity Lightweight Detection • Maintain a moving average • Compare accelerator output with the moving avg Exponential Moving Average (EMA) EMA = ( e * a ) + ( PreviousEMA * (1- a )) α = Smoothing factor e = Current element Limitation: What if no temporal similarity? Solution: Input-based methods Observation: Input Dependent Error Inverse Kinematics 1.2 Increasing Error 1.0 Specific inputs show errors input2 0.8 0.6 0.4 0.2 0.0 -0.2 -0.5 0.0 0.5 input1 1.0 Input Based Methods Approximate on accelerator Useful output recompute no Inputs Predict error High Error? yes Discard accelerator output • Error prediction need not be very accurate – Since we just want to know high error or not • Prediction should be low-cost – So that gains of approximation are not nullified Error Prediction: Decision Tree X[0] <= .01 X[1] <= .15 Error = .19 Error = .40 X[0] <= .50 Error = .01 Error = .21 • Cost is dependent on levels – Higher levels => more accuracy + higher cost • Best configuration by limited exhaustive search – Tree depth limited to a max of 7 levels Training Error Predictors Set of train1 inputs/outputs Train NPU NPU model Set of train2 inputs Exact model Output (approx) Calculate Errors Output (exact) Train Error Predictors Error Predictor Models Recovery: Re-computation • Recovery by re-computation – Possible due to data parallel computations • CPU re-computes elements with large errors – Accelerator and the CPU work in tandem CPU Input Queue Output Queue Rumba Recovery Output Recovery Queue Approximate Accelerator Rumba Detection Experimental Setup/Benchmarks • Neural Processing Unit (NPU) accelerator – Using PyBrain (Artificial Neural Network) library • Energy modeling – GEM5 + McPAT for applications + Cacti Application Domain Error metric blackscholes Financial analysis Mean Relative Error fft Signal processing Mean Relative Error inversek2j Robotics Mean Relative Error jmeint 3D gaming # of Mismatches jpeg Compression Mean Pixel Difference kmeans Machine learning Mean Output Difference sobel Image processing Mean Pixel Difference Quality Vs. Recomputations (inversek2j) Ideal 14 Uniform EMA LinErrors TreeErrors Unchecked accelerator error Average Normalized Error (%) 12 10 8 6 4 2 0 0 15% 20 62% 40 60 Recomputed Elements (%) 80 100 Performance Improvements NPU Ideal Uniform EMA linearErrors treeErrors Error Reduction 6 5 3 4 2 3 2 1 1 0 0 blackscholes fft inversek2j jmeint jpeg kmeans sobel geomean Quality monitoring have no performance overhead (avg case) while reducing errors Speedup wrt cpu Error reduction wrt no checks 4 Energy Savings Uniform 5 EMA linearErrors treeErrors 11 12 Ideal 24 12 Relative to CPU Energy Reduction NPU 4 3 2 1 0 blackscholes fft inversek2j jmeint jpeg kmeans sobel geomean Energy savings are determined by number of fixes False Positives • Detected error that is actually not a large error 40 Uniform EMA linearErrors treeErrors False Positives (%) 35 30 25 20 15 10 5 0 blackscholes fft inversek2j jmeint jpeg kmeans sobel Low false positives no unnecessary recoveries geomean Conclusion Quality is Important • • Detect Errors Re-execute Robust Results Output-based methods are inexpensive Input-based methods are broadly applicable Rumba • Rumba provides quality control in approximate computing and reduces average output error by 2x Rumba Rumba: An Online Quality Management System for Approximate Computing Daya S Khudia, Babak Zamirai, Mehrzad Samadi and Scott Mahlke June 17, 2015 Computer Engineering Laboratory University of Michigan