How much does Exception Handling cost, really?

Download Report

Transcript How much does Exception Handling cost, really?

How much does Exception
Handling cost, really?
Kevin Frei
Visual C++ Code Generation & Tools
http://blogs.msdn.com/freik
Reasons for this talk
(too many assumptions)

Pro’s of EH I’ve heard



More centralized error
handling & recovery
More robust code
More readable code

Cons of EH I’ve heard



Can result in people not
thinking about error
conditions
Can make error recovery
difficult (must put handler
in the “right” place)
Enables abuse of
exceptions
Summary of the previous Pro’s & Con’s

They can all be dealt with




Coding Convention enforcement
Code Reviews
Good initial architecture
Consistent API designs
#1 reason I hear to not use EH:

“Exception handling makes my code too slow”


May be true, but may also be masking a more serious
problem
Some Facts:



EH performance cost is dependent on the runtime, CPU
architecture, and ABI/OS specifics.
You can’t simply examine source code to determine
performance impact.
Deciding whether to use EH should depend on the team,
the libraries you’re using, and a myriad of other issues.
Classes of Code Quality impact

Usage Penalty [EH tax]


Cost of entering a protected region




__try{}, try{}, C++ object with a destructor
Cleanup costs


General overhead of a function with any EH construct
__finally invocation
C++ object destructors
Optimization constraints
Cost of actually handling an exception

If you’re really concerned about this, you’re probably
abusing exceptions.
EH tax for Structured Exception Handling

X86


All functions with SEH contain a complex prolog &
epilog
X64

No required cost to the function itself
EH tax for C++ exception handling

X86


All functions with C++ EH contain a complex
prolog & epilog
X64

1 additional DWORD allocated on stack, initialized
to -2


never again used in the function’s code
It’s used by the C++ runtime in the event of an exception
being thrown or caught.
Protected Region entry & exit costs

X86

Entry & exit from any protected region requires a
1 or 4 byte constant value written to the stack



/EHs can reduce this cost
/EHa may be required by your code base, though
X64

If an entry or an exit is preceded by a call, there is
a single byte NOP to properly identify region
boundaries

Entry preceded by a call is pretty common for C++ EH
(constructors)
Non-exception cleanup costs

X86

SEH: __finally clause is called

[current implementation, not required]
call/ret overhead
 Some other minor register allocation issues
C++EH: Destructor invoked inline [C++ standard]
 Destructor can be inlined, based on compiler (& user)
decision



X64

SEH: __finally clause inlined [zero overhead]


[again, current implementation, not required]
C++EH: same as x86
Optimization Constraints Disclaimer

Consider the complete alternative solution!



HRESULT checking is messy, and error prone
The goto solution to handle termination can
result in pessimized dataflow
Most optimizations that must be constrained
for EH should be constrained for
implementations that don’t use EH.
Optimization constraints

Mandatory optimization constraints



Limitations required by the language standard
ABI specific limitations
Current Implementation constraints

I’ll focus on UTC (current optimizer) in VC8


Code base from VC5 origins.
Many constraints have been removed, which exist in
earlier versions
Mandatory optimization constraints:
Language specific limitations

The C++ language standard does not specify
anything about non-C++ throw exceptions!

The C language standard does not specify
anything about exceptions at all, really.

[I know nothing about C99]
Language specific limitations: C++

Flow from try’s to catch (and out):


Results in additional flow edges at call sites that may throw
exceptions
 Variable values must be updated accordingly
 Slightly less constant propagation, common sub
expression elimination, dead stores, etc…
/EHs – assume only the C++ throw statement can
cause an exception


Prior to VC8.0, you could compile /EHs, and even with an
AV, most destructors would be invoked.
For VC8.0 /EHs:
 If you throw a C++ exception, destructors will be run.
 If any other exception occurs, no destructors will run.
Language specific limitations: /EHa


/EHa – all exceptions should be considered
when destroying C++ objects
Results in far more potential flow from a try
block to a catch block


Less stack packing (no stack pack prior to VC8)
Much less constant propagation, common sub
expression elimination, etc…
Quick /EHc description


Only has impact with /EHs
Tells the compiler that any extern “C”
function will not throw any C++ exceptions



Win32 API calls fall under this class
Sometimes true, sometimes not – be careful.
Only side effect is pruning a few additional
edges in the flow graph

A few more opportunities for optimization
Mandatory Optimization Constraints:
Win32/Win64 ABI specific limitations

Tail-call (call/return -> jump) is illegal inside a
protected region



Instruction level performance hit is typically
negligible
Stack usage increase (can be serious)
Instruction scheduling constraints

Scheduling into & out of handler regions is limited

rarely worth doing, even if it is legal
VC8.0 optimization constraints

No impact on any functions that do not contain
some EH construct


Sometimes requires the programmer add volatile to get
required constraints to occur in function invoked inside a try
Exception handling is only one of a large number of
things that can artificially constrain optimizations






setjmp/longjmp (old school EH in C)
__alloca
__declspec’s
/GS
/fp:except, /fp:precise, /fp:restrict
Many many more.
VC8.0 optimization constraints:
Specifics

Late flow optimizations for x64


Loop optimizer disabled (all platforms) for any
function with a try/__try





Primarily head & tail merging
Loop unrolling/peeling
Induction variable creation
Some strength reduction
Doesn’t impact functions with only C++ objects!
Stack Packing restrictions


Prior to VC8, all variables inside a try block were written
back to the stack whenever their values were updated
With VC8, only variable values that may be visible outside
of the try are written back to the stack.
Source code used for samples

SEH Version
void seh_finally() {
init();
__try {
foo();
bar();
blah();
} __finally {
done();
}
}

C++ Version
struct obj {
obj() {init();}
~obj() {done();}
};
void cpp_dtor() {
obj a;
foo();
bar();
blah();
}

No EH Version
int noeh_cleanup() {
int result = 0;
init();
result = foo_err();
if (result)
goto fail;
result = bar_err();
if (result)
goto fail;
result = blah_err();
fail:
done();
return result;
}
Generated code for x86 SEH /O2
push ebp
mov ebp, esp
push -1
push OFFSET __sehtable$?seh_finally@@YAXXZ
push OFFSET __except_handler3
mov eax, DWORD PTR fs:0
push eax
mov DWORD PTR fs:0, esp
sub esp, 8
call init
mov DWORD PTR __$SEHRec$[ebp+20], 0
call foo
call bar
call blah
mov DWORD PTR __$SEHRec$[ebp+20], -1
call $seh_finally_funclet
mov ecx, DWORD PTR __$SEHRec$[ebp+8]
mov DWORD PTR fs:0, ecx
mov esp, ebp
pop ebp
ret 0
$seh_finally_funclet:
call done
ret 0
;End Prolog
;Enter __try
;Exit __try
;Invoke __finally
;Begin Epilogue
Generated code for x86 SEH /O1
push
push
call
call
and
call
call
call
or
call
call
ret
8
OFFSET __sehtable$seh_finally
__SEH_prolog
init
__$SEHRec$[ebp+20], 0
foo
bar
blah
__$SEHRec$[ebp+20], -1
$seh_finally_funclet
__SEH_epilog
0
$seh_finally_funclet:
call
blah
ret
0
;End Prologue
;Entry __try
;Exit __try
;Invoke __finally
;Begin Epilogue
Generated code for x86 C++ /O2
push
push
mov
push
mov
push
call
mov
call
call
call
mov
call
mov
mov
add
ret
-1
__ehhandler$?cpp_dtor@@YAXXZ
eax, DWORD PTR fs:0
eax
DWORD PTR fs:0, esp
ecx
init
DWORD PTR __$EHRec$[esp+24], 0
foo
bar
blah
DWORD PTR __$EHRec$[esp+24], -1
done
ecx, DWORD PTR __$EHRec$[esp+16]
DWORD PTR fs:0, ecx
esp, 16
0
;End Prologue
;allocate space for obj
;obj() inlined
;Enter try
;Exit try
;~obj() inlined
;Begin Epilogue
Generated code for x86 C++ /O1
mov
call
push
call
and
call
call
call
or
call
mov
mov
leave
ret
eax, __ehhandler$?cpp_dtor@@YAXXZ
__EH_prolog
ecx
init
DWORD PTR __$EHRec$[ebp+8], 0
foo
bar
blah
DWORD PTR __$EHRec$[ebp+8], -1
done
ecx, DWORD PTR __$EHRec$[ebp]
DWORD PTR fs:0, ecx
0
;End Prologue
;allocate space for obj
;obj() inlined
;Entry try
;Exit try
;~obj() inlined
;Begin Epilogue
Generated code for x86 No EH (/O1 &
/O2 are basically identical)
push
call
call
mov
test
jne
call
mov
test
jne
call
mov
$fail:
call
mov
pop
ret
esi
init
foo_err
esi, eax
esi, esi
SHORT $fail
bar_err
esi, eax
esi, esi
SHORT $fail
blah_err
esi, eax
done
eax, esi
esi
0
;Save nonvolatile register for result
;Save return code
;Return code check
;Save return code
;Return code check
;Save return code
;Return result
Generated code for x64 SEH
sub
call
nop
call
call
call
nop
call
add
ret
rsp, 40
init
;End Prologue
foo
bar
blah
;First instruction of __try
done
rsp, 40
0
;Last instruction of __try
;__finally invoked inline
;Begin Epilogue
Generated code for x64 C++ EH
sub
mov
call
nop
call
call
call
nop
add
jmp
rsp, 56
;End Prologue
QWORD PTR $T[rsp], -2
; C++ setup
init
foo
bar
blah
rsp, 56
done
;First instruction of try
;Last instruction of try
;Begin Epilogue
;~obj() inlined & tail called
Generated code for x64 No EH
push
sub
call
call
mov
test
jne
call
mov
test
jne
call
mov
$fail:
call
mov
add
pop
ret
rbx
rsp, 32
init
foo_err
ebx, eax
eax, eax
SHORT $fail
bar_err
ebx, eax
eax, eax
SHORT $fail
blah_err
ebx, eax
done
eax, ebx
rsp, 32
rbx
0
;Save nonvolatile register for result
;End Prologue
;Save return code
;Return code check
;Save return code
;Return code check
;Save return code
;Get return code
;Restore nonvolatile register
Costs of handling an exception
Disclaimer:
If you are really concerned about this, there is a
good chance you’re abusing or misusing
exceptions.
Exceptions are not to deal with standard
scenarios! Performance of exceptions is
generally stacked in favor of the nonexceptional case
There’s a reason the term is “exception”!
Costs of handling an exception:
X86 – Win32 – SEH & C++ EH


Without /SAFESEH (this is a big no-no – potential security hole)
O(n)



Walk a linked list of elements on [fs:0]
Invoke filters to determine handler





n is the number of frames on the stack with a protected region between throw & catch
C++ type check is just a special filter
Walk the list again, invoking __finally funclets & destructors
Finally, jump to __except block or call catch block
With /SAFESEH (this is good)
O(n log(m))






n is the number of frames on the stack with a protected region between throw & catch
m is the number of EH entry points in the entire program

For SEH, only 1. For C++ EH, one for each function!
Walk a linked list of elements of [fs:0]
For each element, verify the callback is in a list [O log(m)]
Invoke the filter to determine the handler
Walk the list again, invoking __finally’s, with callback verification [O log(m)]
Costs of handling an exception:
x64 – Win64 – SEH & C++ EH

O(n log(m))




Walk each function frame on the stack [O(n)]
Find it’s .pdata entry to get it’s unwind information [O(log(m))]



If it has a filter, call it to determine the handler
Restore nonvolatile registers as described in the unwind information
Once a handler has been determined



n is the number of functions on the stack between throw & catch (not just
the number with EH code in them!)
m is the number of distinct regions in the image [.pdata size]

Not just a function count – hot/cold sections and register allocation
regions can increase this pretty dramatically (1-4x)
Walk the stack again (using .pdata lookup)
Each frame that has cleanup code, invoke the finally’s or destructors
Jump to handler (or call catch)
Cost of handling an exception:
x86 – WoW64 – SEH & C++EH

There is some degree of thunking between
the 64 bit kernel and 32 bit subsystem, so
performance really varies.



Worst case, it’s as slow as x64 on Win64.
Best case it’s about the same as x86 on Win32.
If you use exception handling in performance
sensitive areas of code, you may notice a
difference in your application

If you do notice a difference, this should be a red
flag regarding your use of exceptions.
Final gotchas (non-standard C++!)

Some optimizations that are constrained inside of a try result
in observable differences, based on program structure,
compiler settings, and compiler implementation .
int g; // add a volatile to fix the problem
int *p;
void func1() {
g = 0;
__try {
g = 1;
*p = 0;
g = 2;
} __except(1) {
printf("%d\n", g);
}
}
void update() {
g = 1;
*p = 0;
g = 2;
}
void func2() {
g = 0;
__try {
update();
} __except(1) {
printf("%d\n", g);
}
}
Summary & Conclusions


Do not use exceptions for normal program flow.
Exception handling does have a performance cost
 Not always measurable
 Cost really depends on usage
 Frequently similar to what correct code would be, without EH





[at least in VC8]
Do not use exceptions for normal program flow.
C++ is cheaper than SEH for cleanup in VC8.
Use common sense, and knowledge of your team’s
strengths/weaknesses if you’re mandating SEH/C++ EH/No EH
 New hires rarely know about SEH.
 Source level readability & visibility of performance
And finally, do not use exceptions for normal program flow.
More info



If you’re looking for detailed ABI docs for
X64, check my blog.
http://blogs.msdn.com/freik
Herb Sutter’s got some good books on using
exceptions with C++

He doesn’t give me kick backs 