Low Cost Control Flow Protection Using Abstract Control Signatures Daya S Khudia and Scott Mahlke University of Michigan University of Michigan Electrical Engineering and Computer.

Download Report

Transcript Low Cost Control Flow Protection Using Abstract Control Signatures Daya S Khudia and Scott Mahlke University of Michigan University of Michigan Electrical Engineering and Computer.

Low Cost Control Flow Protection
Using Abstract Control Signatures
Daya S Khudia and Scott Mahlke
University of Michigan
1
University of Michigan
Electrical Engineering and Computer Science
Soft Errors
• Soft errors, also called single-event upsets(SEUs)
– Occur because of
• High energy particle strikes or electrical noise
• Parameters affecting soft error rates
– Shrinking dimensions, Voltage scaling
• 100 times increase from 180nm to 16nm (Borkar, Micro’05). One failure per
day every chip at 16nm (Feng et al, ASPLOS’10)
Image credit: Certichip
2
University of Michigan
Electrical Engineering and Computer Science
Soft Error Detection
Control flow
Increasing Overhead
Data flow
~100-200%
DMR, TMR
DMR, TMR
Instruction duplication
Signature/assertion
based
~30-70%
(SWIFT, EDDI)
~10-30%
Instruction duplication +
hardware symptoms
(CFCSS, ACFC)
Target Solution
(Shoestring, profileBased)
Redundant
execution
a single-threaded
context solution
Traditional
Software-based
Our targetduplication
Combine
dual/triple
is
a low-overhead
control
–in
and
modular
flow
symptoms
protection
control
redundancy
flow protection
Compiler
original
and redundant instructions
Usually
Comparable
Improved
Mission-critical
byinterleaves
by
embedding
coverage
using
reliability
profiling
signatures/assertions
in basic blocks
3
University of Michigan
Electrical Engineering and Computer Science
Why Control Flow Errors?
• More than 70% of the transient faults lead to control
flow errors (Vahdatpour et al.)
• Faults in hardware components manifest as control
flow errors
• Program counter
• Address circuitry
Correct executions
Incorrect executions
Control Flow
Target Errors
Data Flow
Errors
0%
10%
20%
40%
50%
60%
70%
80%
90%
Errors
in 30%
branch
targets
are
2.5x
more
% of runs
likely to result in incorrect executions
4
University of Michigan
Electrical Engineering and Computer Science
100%
Outline
•
•
•
•
•
Background
Software-based control flow checking
Abstract Control Signatures (ACS)
Experimental evaluation
Conclusions
5
University of Michigan
Electrical Engineering and Computer Science
Control Flow Checking
BB1
update sig var
• Two steps for control flow
checking
• Compute signature at
runtime
• Compare with an
expected correct value
check sig var
BB2
update sig var
check sig var
• In case of illegal control
flow transfer, the
signature check fails
6
University of Michigan
Electrical Engineering and Computer Science
Signature-Based Control Flow Checking
G = G xor d1
BB1 s1
d1 = - - -
G = = s1?
G = G xor d2
G = = s2?
BB2 s2
d2 = s1 xor s2
• Software-based control flow
checking
• Update signature in each basic
block
• Check signature in each basic
block
• Can only handle errors in branch
targets
• Errors in branch directions
(conditions) are not covered
G = G xor d2
G = s1 xor s1 xor s2
G = s2
7
University of Michigan
Electrical Engineering and Computer Science
Signature-Based Control Flow Checking
G = G xor d1
G = G xor D2
D1 = 0
BB1
s1
d1 = - - G = G xor d3
D1 = s1 xor s3
G = = s1?
BB3 s3
d3 = s- xor s3
G = = s3 ?
G = G xor d2
G = G xor D1
G = = s2?
BB2
For branch fan-in nodes
s2
d2 = s1 xor s2
• Extra updates
• Dynamically adjusting
signature are required
8
University of Michigan
Electrical Engineering and Computer Science
Abstract Control Signatures
• Sources of overhead
G = G xor d1
G = G xor D2
D1 = 0
BB1
BB3
• Signature updates
• Signature checks
G = G xor d3
•D1 Form
= s1 xorregions
s3
G = = s1?
• sAbstract
G==
3?
G = G xor d2
BB4
G = G xor
D1
away the details
of control flow inside a
region
BB2
G = G xor d4
D2 = s2 xor s6
GG === s=? s2?
4
BB5
G = G xor d5
D3 = s4 xor s7
G = = s5?
9
University of Michigan
Electrical Engineering and Computer Science
Abstract Control Signatures
G = G xor d1
G =update
G xor D2
Sig
D1 = 0
• Sources of overhead
BB1
• Signature updates
• Signature checks
BB3
G = G xor d3
Sig=update
D
s1 xor s3
1
G = = s1?
G = = s 3?
G = G xor d2
Sig
G = update
G xor D1
BB2
BB4
• Form regions
• Abstract away the details
of control flow inside a
regionsignature updates
• Optimize
BB5
• Optimize checks
G = = s2?
G = update
G xor d4
Sig
D2 = s2 xor s6
Sig check
G = = s 4?
G = G xor d
Sig update 3
D3 = s4 xor s7
G = = s 5?
• check simple run-time
properties
• Insert checks at region
boundaries
10
University of Michigan
Electrical Engineering and Computer Science
Insight 1: Optimized updates
• Signature checking
C1 = 1
C1 = C1 + 1
C1 = C1 + 1
bb3
C1 = C1 + 1
C1 = C1 + 1
C1 = = 5?
• Make sure that control
flow transfer took place
from a legal predecessor
bb1
bb2
C1 = C1 + 1
• Check counters (path
length)
bb4
bb5
• Makes sure that proper
number of BBs in
predecessor region were
visited
bb6
C1 = = 4?
11
University of Michigan
Electrical Engineering and Computer Science
Insight 2: Optimized checks
bb1
bb2
Interval 1
• Sufficient to have a single
check for a group of basic
blocks
bb3
bb4
bb_latch1
bb_latch2
• Requirement on regions
• The header block of a
Interval 2
region should dominate
all the BBs in that region
(single entry point)
• Nested loops should not
be contained in a region
12
University of Michigan
Electrical Engineering and Computer Science
Balancing Increments
• Naively inserting checks
bb1
C1 = 1
C1 = C1 + 1
C1 = C1 + 1
• Multiple counter value
checks would be required at
exits
bb3
C1 = C1 + 1
C1 = C1 + 1
C1 C
= ==4=or5?5?
Insert extra
increment along
these edges
bb2
C1 = C1 + 1
bb4
bb5
• Developed an algorithm to get
(details are in paper)
• increment edges
• increment amounts
C1 = C1 + 1
C
C11 == == 5?
3 or 4?
1
13
University of Michigan
Electrical Engineering and Computer Science
Optimization for Loops
bbN
C1 = 0
bb1
C1 = C1 + 1
bb1
C1 = 1
bb2
C1 = C1 + 1
bb4
C1 = C1 + 1
bb2
C1 = C1 + 1
bb4
C1 = C1 + 1
C1 == 3?
bb3
C1 = C1 + 1
bb4
C
C11 == C
C11 ++ 12
CC
%==
4 ==
0? 0?
1 /1 3
• Move checks out of the loop
• Insert increments
• Such that counter value is a power of two (facilitates
remainder operation instead of costly division)
14
University of Michigan
Electrical Engineering and Computer Science
Handling Call and Return Insts
• Every function in the program is assigned a unique
path length
• Global Signature variable is
• Updated before and inversely updated after call
• Inversely updated and updated inside callee
call foo;
foo:
Entry_BB
inverse update sig var
Ret_BB
update sig var
return;
update sig var with call specific length
call foo;
Inverse update with call specific length
check sig var
15
University of Michigan
Electrical Engineering and Computer Science
Runtime
Compilation
System Overview
Insert signature
updates and checks
Operating System
Physical Hardware
• Collect required program information
• Analyze program structure
• Insert signature updates and checks
• Trigger lightweight recovery based on
• selective symptoms (hardware
exceptions)
• signature comparison fails
16
University of Michigan
Electrical Engineering and Computer Science
Evaluation Methodology
• Program analysis and signatures updates/checks
– Implemented as compiler pass in the LLVM compiler
• SPECINT2K Benchmarks
• Statistical fault injection (SFI) experiments
– GEM5 simulator in ARM syscall emulation mode
• Random (single) bit flip in control flow target
– Simulated entire benchmarks after fault injection
– Log files analyzed for results classification
17
University of Michigan
Electrical Engineering and Computer Science
Performance Overhead
Performance Overhead (runtime)
160%
CFCSS
CFCSS_ivl
ACS
ACS + calls_rets
140%
120%
100%
80%
60%
40%
20%
0%
The performance overhead is down from
75% to 11%
18
University of Michigan
Electrical Engineering and Computer Science
% of runs
CFCSS
CFCSS_ivl
ACS
ACS + calls_rets
CFCSS
CFCSS_ivl
ACS
ACS + calls_rets
164.gzip
175.vpr
176.gcc
181.mcf
On average, fault coverage of ACS is comparable
to CFCSS with almost 7x reduction in overhead
186.crafty
19
253.perl
254.gap
255.vortex
256.bzip2
300.twolf
CFCSS
CFCSS_ivl
ACS
ACS + calls_rets
CFCSS
CFCSS_ivl
ACS
ACS + calls_rets
Failures
CFCSS
CFCSS_ivl
ACS
ACS + calls_rets
HWDetects
CFCSS
CFCSS_ivl
ACS
ACS + calls_rets
CFCSS
CFCSS_ivl
ACS
ACS + calls_rets
197.parser
CFCSS
CFCSS_ivl
ACS
ACS + calls_rets
CFDetects
CFCSS
CFCSS_ivl
ACS
ACS + calls_rets
Masked
CFCSS
CFCSS_ivl
ACS
ACS + calls_rets
CFCSS
CFCSS_ivl
ACS
ACS + calls_rets
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
CFCSS
CFCSS_ivl
ACS
ACS + calls_rets
Fault Coverage
SDCs
average
University of Michigan
Electrical Engineering and Computer Science
Fault Detection Latency
WithIn2K
Within10K
WithIn100K
1.20
1.15
1.10
1.05
1.00
0.95
0.90
164.gzip
20
ACS
Fault detection latency is affected by a
175.vpr 176.gcc 181.mcf 186.crafty197.parser 253.perl 254.gap 255.vortex 256.bzip2 300.twolf
maximum of 5%
CFCSS
ACS
CFCSS
ACS
CFCSS
ACS
CFCSS
ACS
CFCSS
ACS
CFCSS
ACS
CFCSS
ACS
CFCSS
ACS
CFCSS
ACS
CFCSS
ACS
CFCSS
0.80
ACS
0.85
CFCSS
Normalized detection latency
1.25
average
University of Michigan
Electrical Engineering and Computer Science
Conclusions
• We propose Abstract Control Signatures (ACS)
– Signature checking at coarse-grain
– Simplified signature updates
• In comparison to a traditional signature based
scheme (CFCSS)
– Reduces performance overhead from 75% down to 11%
– Fault coverage is comparable
21
University of Michigan
Electrical Engineering and Computer Science
22
University of Michigan
Electrical Engineering and Computer Science
Fault Injection Outcome Classification
• Masked
– No corruption in the program output
• CFDetects
– Detected by control flow checking
• Covered by symptoms (HWDetects)
– Produces a symptom such as page fault in 2000 cycles of fault injection
• Failures
– Fail status on program termination or infinite loop.
• SDCs (Silent Data Corruptions)
– Fault injections which results in user visible corruptions
23
University of Michigan
Electrical Engineering and Computer Science