Low Cost Control Flow Protection Using Abstract Control Signatures Daya S Khudia and Scott Mahlke University of Michigan University of Michigan Electrical Engineering and Computer.
Download ReportTranscript Low Cost Control Flow Protection Using Abstract Control Signatures Daya S Khudia and Scott Mahlke University of Michigan University of Michigan Electrical Engineering and Computer.
Low Cost Control Flow Protection Using Abstract Control Signatures Daya S Khudia and Scott Mahlke University of Michigan 1 University of Michigan Electrical Engineering and Computer Science Soft Errors • Soft errors, also called single-event upsets(SEUs) – Occur because of • High energy particle strikes or electrical noise • Parameters affecting soft error rates – Shrinking dimensions, Voltage scaling • 100 times increase from 180nm to 16nm (Borkar, Micro’05). One failure per day every chip at 16nm (Feng et al, ASPLOS’10) Image credit: Certichip 2 University of Michigan Electrical Engineering and Computer Science Soft Error Detection Control flow Increasing Overhead Data flow ~100-200% DMR, TMR DMR, TMR Instruction duplication Signature/assertion based ~30-70% (SWIFT, EDDI) ~10-30% Instruction duplication + hardware symptoms (CFCSS, ACFC) Target Solution (Shoestring, profileBased) Redundant execution a single-threaded context solution Traditional Software-based Our targetduplication Combine dual/triple is a low-overhead control –in and modular flow symptoms protection control redundancy flow protection Compiler original and redundant instructions Usually Comparable Improved Mission-critical byinterleaves by embedding coverage using reliability profiling signatures/assertions in basic blocks 3 University of Michigan Electrical Engineering and Computer Science Why Control Flow Errors? • More than 70% of the transient faults lead to control flow errors (Vahdatpour et al.) • Faults in hardware components manifest as control flow errors • Program counter • Address circuitry Correct executions Incorrect executions Control Flow Target Errors Data Flow Errors 0% 10% 20% 40% 50% 60% 70% 80% 90% Errors in 30% branch targets are 2.5x more % of runs likely to result in incorrect executions 4 University of Michigan Electrical Engineering and Computer Science 100% Outline • • • • • Background Software-based control flow checking Abstract Control Signatures (ACS) Experimental evaluation Conclusions 5 University of Michigan Electrical Engineering and Computer Science Control Flow Checking BB1 update sig var • Two steps for control flow checking • Compute signature at runtime • Compare with an expected correct value check sig var BB2 update sig var check sig var • In case of illegal control flow transfer, the signature check fails 6 University of Michigan Electrical Engineering and Computer Science Signature-Based Control Flow Checking G = G xor d1 BB1 s1 d1 = - - - G = = s1? G = G xor d2 G = = s2? BB2 s2 d2 = s1 xor s2 • Software-based control flow checking • Update signature in each basic block • Check signature in each basic block • Can only handle errors in branch targets • Errors in branch directions (conditions) are not covered G = G xor d2 G = s1 xor s1 xor s2 G = s2 7 University of Michigan Electrical Engineering and Computer Science Signature-Based Control Flow Checking G = G xor d1 G = G xor D2 D1 = 0 BB1 s1 d1 = - - G = G xor d3 D1 = s1 xor s3 G = = s1? BB3 s3 d3 = s- xor s3 G = = s3 ? G = G xor d2 G = G xor D1 G = = s2? BB2 For branch fan-in nodes s2 d2 = s1 xor s2 • Extra updates • Dynamically adjusting signature are required 8 University of Michigan Electrical Engineering and Computer Science Abstract Control Signatures • Sources of overhead G = G xor d1 G = G xor D2 D1 = 0 BB1 BB3 • Signature updates • Signature checks G = G xor d3 •D1 Form = s1 xorregions s3 G = = s1? • sAbstract G== 3? G = G xor d2 BB4 G = G xor D1 away the details of control flow inside a region BB2 G = G xor d4 D2 = s2 xor s6 GG === s=? s2? 4 BB5 G = G xor d5 D3 = s4 xor s7 G = = s5? 9 University of Michigan Electrical Engineering and Computer Science Abstract Control Signatures G = G xor d1 G =update G xor D2 Sig D1 = 0 • Sources of overhead BB1 • Signature updates • Signature checks BB3 G = G xor d3 Sig=update D s1 xor s3 1 G = = s1? G = = s 3? G = G xor d2 Sig G = update G xor D1 BB2 BB4 • Form regions • Abstract away the details of control flow inside a regionsignature updates • Optimize BB5 • Optimize checks G = = s2? G = update G xor d4 Sig D2 = s2 xor s6 Sig check G = = s 4? G = G xor d Sig update 3 D3 = s4 xor s7 G = = s 5? • check simple run-time properties • Insert checks at region boundaries 10 University of Michigan Electrical Engineering and Computer Science Insight 1: Optimized updates • Signature checking C1 = 1 C1 = C1 + 1 C1 = C1 + 1 bb3 C1 = C1 + 1 C1 = C1 + 1 C1 = = 5? • Make sure that control flow transfer took place from a legal predecessor bb1 bb2 C1 = C1 + 1 • Check counters (path length) bb4 bb5 • Makes sure that proper number of BBs in predecessor region were visited bb6 C1 = = 4? 11 University of Michigan Electrical Engineering and Computer Science Insight 2: Optimized checks bb1 bb2 Interval 1 • Sufficient to have a single check for a group of basic blocks bb3 bb4 bb_latch1 bb_latch2 • Requirement on regions • The header block of a Interval 2 region should dominate all the BBs in that region (single entry point) • Nested loops should not be contained in a region 12 University of Michigan Electrical Engineering and Computer Science Balancing Increments • Naively inserting checks bb1 C1 = 1 C1 = C1 + 1 C1 = C1 + 1 • Multiple counter value checks would be required at exits bb3 C1 = C1 + 1 C1 = C1 + 1 C1 C = ==4=or5?5? Insert extra increment along these edges bb2 C1 = C1 + 1 bb4 bb5 • Developed an algorithm to get (details are in paper) • increment edges • increment amounts C1 = C1 + 1 C C11 == == 5? 3 or 4? 1 13 University of Michigan Electrical Engineering and Computer Science Optimization for Loops bbN C1 = 0 bb1 C1 = C1 + 1 bb1 C1 = 1 bb2 C1 = C1 + 1 bb4 C1 = C1 + 1 bb2 C1 = C1 + 1 bb4 C1 = C1 + 1 C1 == 3? bb3 C1 = C1 + 1 bb4 C C11 == C C11 ++ 12 CC %== 4 == 0? 0? 1 /1 3 • Move checks out of the loop • Insert increments • Such that counter value is a power of two (facilitates remainder operation instead of costly division) 14 University of Michigan Electrical Engineering and Computer Science Handling Call and Return Insts • Every function in the program is assigned a unique path length • Global Signature variable is • Updated before and inversely updated after call • Inversely updated and updated inside callee call foo; foo: Entry_BB inverse update sig var Ret_BB update sig var return; update sig var with call specific length call foo; Inverse update with call specific length check sig var 15 University of Michigan Electrical Engineering and Computer Science Runtime Compilation System Overview Insert signature updates and checks Operating System Physical Hardware • Collect required program information • Analyze program structure • Insert signature updates and checks • Trigger lightweight recovery based on • selective symptoms (hardware exceptions) • signature comparison fails 16 University of Michigan Electrical Engineering and Computer Science Evaluation Methodology • Program analysis and signatures updates/checks – Implemented as compiler pass in the LLVM compiler • SPECINT2K Benchmarks • Statistical fault injection (SFI) experiments – GEM5 simulator in ARM syscall emulation mode • Random (single) bit flip in control flow target – Simulated entire benchmarks after fault injection – Log files analyzed for results classification 17 University of Michigan Electrical Engineering and Computer Science Performance Overhead Performance Overhead (runtime) 160% CFCSS CFCSS_ivl ACS ACS + calls_rets 140% 120% 100% 80% 60% 40% 20% 0% The performance overhead is down from 75% to 11% 18 University of Michigan Electrical Engineering and Computer Science % of runs CFCSS CFCSS_ivl ACS ACS + calls_rets CFCSS CFCSS_ivl ACS ACS + calls_rets 164.gzip 175.vpr 176.gcc 181.mcf On average, fault coverage of ACS is comparable to CFCSS with almost 7x reduction in overhead 186.crafty 19 253.perl 254.gap 255.vortex 256.bzip2 300.twolf CFCSS CFCSS_ivl ACS ACS + calls_rets CFCSS CFCSS_ivl ACS ACS + calls_rets Failures CFCSS CFCSS_ivl ACS ACS + calls_rets HWDetects CFCSS CFCSS_ivl ACS ACS + calls_rets CFCSS CFCSS_ivl ACS ACS + calls_rets 197.parser CFCSS CFCSS_ivl ACS ACS + calls_rets CFDetects CFCSS CFCSS_ivl ACS ACS + calls_rets Masked CFCSS CFCSS_ivl ACS ACS + calls_rets CFCSS CFCSS_ivl ACS ACS + calls_rets 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% CFCSS CFCSS_ivl ACS ACS + calls_rets Fault Coverage SDCs average University of Michigan Electrical Engineering and Computer Science Fault Detection Latency WithIn2K Within10K WithIn100K 1.20 1.15 1.10 1.05 1.00 0.95 0.90 164.gzip 20 ACS Fault detection latency is affected by a 175.vpr 176.gcc 181.mcf 186.crafty197.parser 253.perl 254.gap 255.vortex 256.bzip2 300.twolf maximum of 5% CFCSS ACS CFCSS ACS CFCSS ACS CFCSS ACS CFCSS ACS CFCSS ACS CFCSS ACS CFCSS ACS CFCSS ACS CFCSS ACS CFCSS 0.80 ACS 0.85 CFCSS Normalized detection latency 1.25 average University of Michigan Electrical Engineering and Computer Science Conclusions • We propose Abstract Control Signatures (ACS) – Signature checking at coarse-grain – Simplified signature updates • In comparison to a traditional signature based scheme (CFCSS) – Reduces performance overhead from 75% down to 11% – Fault coverage is comparable 21 University of Michigan Electrical Engineering and Computer Science 22 University of Michigan Electrical Engineering and Computer Science Fault Injection Outcome Classification • Masked – No corruption in the program output • CFDetects – Detected by control flow checking • Covered by symptoms (HWDetects) – Produces a symptom such as page fault in 2000 cycles of fault injection • Failures – Fail status on program termination or infinite loop. • SDCs (Silent Data Corruptions) – Fault injections which results in user visible corruptions 23 University of Michigan Electrical Engineering and Computer Science