How much does Exception Handling cost, really?
Download
Report
Transcript How much does Exception Handling cost, really?
How much does Exception
Handling cost, really?
Kevin Frei
Visual C++ Code Generation & Tools
http://blogs.msdn.com/freik
Reasons for this talk
(too many assumptions)
Pro’s of EH I’ve heard
More centralized error
handling & recovery
More robust code
More readable code
Cons of EH I’ve heard
Can result in people not
thinking about error
conditions
Can make error recovery
difficult (must put handler
in the “right” place)
Enables abuse of
exceptions
Summary of the previous Pro’s & Con’s
They can all be dealt with
Coding Convention enforcement
Code Reviews
Good initial architecture
Consistent API designs
#1 reason I hear to not use EH:
“Exception handling makes my code too slow”
May be true, but may also be masking a more serious
problem
Some Facts:
EH performance cost is dependent on the runtime, CPU
architecture, and ABI/OS specifics.
You can’t simply examine source code to determine
performance impact.
Deciding whether to use EH should depend on the team,
the libraries you’re using, and a myriad of other issues.
Classes of Code Quality impact
Usage Penalty [EH tax]
Cost of entering a protected region
__try{}, try{}, C++ object with a destructor
Cleanup costs
General overhead of a function with any EH construct
__finally invocation
C++ object destructors
Optimization constraints
Cost of actually handling an exception
If you’re really concerned about this, you’re probably
abusing exceptions.
EH tax for Structured Exception Handling
X86
All functions with SEH contain a complex prolog &
epilog
X64
No required cost to the function itself
EH tax for C++ exception handling
X86
All functions with C++ EH contain a complex
prolog & epilog
X64
1 additional DWORD allocated on stack, initialized
to -2
never again used in the function’s code
It’s used by the C++ runtime in the event of an exception
being thrown or caught.
Protected Region entry & exit costs
X86
Entry & exit from any protected region requires a
1 or 4 byte constant value written to the stack
/EHs can reduce this cost
/EHa may be required by your code base, though
X64
If an entry or an exit is preceded by a call, there is
a single byte NOP to properly identify region
boundaries
Entry preceded by a call is pretty common for C++ EH
(constructors)
Non-exception cleanup costs
X86
SEH: __finally clause is called
[current implementation, not required]
call/ret overhead
Some other minor register allocation issues
C++EH: Destructor invoked inline [C++ standard]
Destructor can be inlined, based on compiler (& user)
decision
X64
SEH: __finally clause inlined [zero overhead]
[again, current implementation, not required]
C++EH: same as x86
Optimization Constraints Disclaimer
Consider the complete alternative solution!
HRESULT checking is messy, and error prone
The goto solution to handle termination can
result in pessimized dataflow
Most optimizations that must be constrained
for EH should be constrained for
implementations that don’t use EH.
Optimization constraints
Mandatory optimization constraints
Limitations required by the language standard
ABI specific limitations
Current Implementation constraints
I’ll focus on UTC (current optimizer) in VC8
Code base from VC5 origins.
Many constraints have been removed, which exist in
earlier versions
Mandatory optimization constraints:
Language specific limitations
The C++ language standard does not specify
anything about non-C++ throw exceptions!
The C language standard does not specify
anything about exceptions at all, really.
[I know nothing about C99]
Language specific limitations: C++
Flow from try’s to catch (and out):
Results in additional flow edges at call sites that may throw
exceptions
Variable values must be updated accordingly
Slightly less constant propagation, common sub
expression elimination, dead stores, etc…
/EHs – assume only the C++ throw statement can
cause an exception
Prior to VC8.0, you could compile /EHs, and even with an
AV, most destructors would be invoked.
For VC8.0 /EHs:
If you throw a C++ exception, destructors will be run.
If any other exception occurs, no destructors will run.
Language specific limitations: /EHa
/EHa – all exceptions should be considered
when destroying C++ objects
Results in far more potential flow from a try
block to a catch block
Less stack packing (no stack pack prior to VC8)
Much less constant propagation, common sub
expression elimination, etc…
Quick /EHc description
Only has impact with /EHs
Tells the compiler that any extern “C”
function will not throw any C++ exceptions
Win32 API calls fall under this class
Sometimes true, sometimes not – be careful.
Only side effect is pruning a few additional
edges in the flow graph
A few more opportunities for optimization
Mandatory Optimization Constraints:
Win32/Win64 ABI specific limitations
Tail-call (call/return -> jump) is illegal inside a
protected region
Instruction level performance hit is typically
negligible
Stack usage increase (can be serious)
Instruction scheduling constraints
Scheduling into & out of handler regions is limited
rarely worth doing, even if it is legal
VC8.0 optimization constraints
No impact on any functions that do not contain
some EH construct
Sometimes requires the programmer add volatile to get
required constraints to occur in function invoked inside a try
Exception handling is only one of a large number of
things that can artificially constrain optimizations
setjmp/longjmp (old school EH in C)
__alloca
__declspec’s
/GS
/fp:except, /fp:precise, /fp:restrict
Many many more.
VC8.0 optimization constraints:
Specifics
Late flow optimizations for x64
Loop optimizer disabled (all platforms) for any
function with a try/__try
Primarily head & tail merging
Loop unrolling/peeling
Induction variable creation
Some strength reduction
Doesn’t impact functions with only C++ objects!
Stack Packing restrictions
Prior to VC8, all variables inside a try block were written
back to the stack whenever their values were updated
With VC8, only variable values that may be visible outside
of the try are written back to the stack.
Source code used for samples
SEH Version
void seh_finally() {
init();
__try {
foo();
bar();
blah();
} __finally {
done();
}
}
C++ Version
struct obj {
obj() {init();}
~obj() {done();}
};
void cpp_dtor() {
obj a;
foo();
bar();
blah();
}
No EH Version
int noeh_cleanup() {
int result = 0;
init();
result = foo_err();
if (result)
goto fail;
result = bar_err();
if (result)
goto fail;
result = blah_err();
fail:
done();
return result;
}
Generated code for x86 SEH /O2
push ebp
mov ebp, esp
push -1
push OFFSET __sehtable$?seh_finally@@YAXXZ
push OFFSET __except_handler3
mov eax, DWORD PTR fs:0
push eax
mov DWORD PTR fs:0, esp
sub esp, 8
call init
mov DWORD PTR __$SEHRec$[ebp+20], 0
call foo
call bar
call blah
mov DWORD PTR __$SEHRec$[ebp+20], -1
call $seh_finally_funclet
mov ecx, DWORD PTR __$SEHRec$[ebp+8]
mov DWORD PTR fs:0, ecx
mov esp, ebp
pop ebp
ret 0
$seh_finally_funclet:
call done
ret 0
;End Prolog
;Enter __try
;Exit __try
;Invoke __finally
;Begin Epilogue
Generated code for x86 SEH /O1
push
push
call
call
and
call
call
call
or
call
call
ret
8
OFFSET __sehtable$seh_finally
__SEH_prolog
init
__$SEHRec$[ebp+20], 0
foo
bar
blah
__$SEHRec$[ebp+20], -1
$seh_finally_funclet
__SEH_epilog
0
$seh_finally_funclet:
call
blah
ret
0
;End Prologue
;Entry __try
;Exit __try
;Invoke __finally
;Begin Epilogue
Generated code for x86 C++ /O2
push
push
mov
push
mov
push
call
mov
call
call
call
mov
call
mov
mov
add
ret
-1
__ehhandler$?cpp_dtor@@YAXXZ
eax, DWORD PTR fs:0
eax
DWORD PTR fs:0, esp
ecx
init
DWORD PTR __$EHRec$[esp+24], 0
foo
bar
blah
DWORD PTR __$EHRec$[esp+24], -1
done
ecx, DWORD PTR __$EHRec$[esp+16]
DWORD PTR fs:0, ecx
esp, 16
0
;End Prologue
;allocate space for obj
;obj() inlined
;Enter try
;Exit try
;~obj() inlined
;Begin Epilogue
Generated code for x86 C++ /O1
mov
call
push
call
and
call
call
call
or
call
mov
mov
leave
ret
eax, __ehhandler$?cpp_dtor@@YAXXZ
__EH_prolog
ecx
init
DWORD PTR __$EHRec$[ebp+8], 0
foo
bar
blah
DWORD PTR __$EHRec$[ebp+8], -1
done
ecx, DWORD PTR __$EHRec$[ebp]
DWORD PTR fs:0, ecx
0
;End Prologue
;allocate space for obj
;obj() inlined
;Entry try
;Exit try
;~obj() inlined
;Begin Epilogue
Generated code for x86 No EH (/O1 &
/O2 are basically identical)
push
call
call
mov
test
jne
call
mov
test
jne
call
mov
$fail:
call
mov
pop
ret
esi
init
foo_err
esi, eax
esi, esi
SHORT $fail
bar_err
esi, eax
esi, esi
SHORT $fail
blah_err
esi, eax
done
eax, esi
esi
0
;Save nonvolatile register for result
;Save return code
;Return code check
;Save return code
;Return code check
;Save return code
;Return result
Generated code for x64 SEH
sub
call
nop
call
call
call
nop
call
add
ret
rsp, 40
init
;End Prologue
foo
bar
blah
;First instruction of __try
done
rsp, 40
0
;Last instruction of __try
;__finally invoked inline
;Begin Epilogue
Generated code for x64 C++ EH
sub
mov
call
nop
call
call
call
nop
add
jmp
rsp, 56
;End Prologue
QWORD PTR $T[rsp], -2
; C++ setup
init
foo
bar
blah
rsp, 56
done
;First instruction of try
;Last instruction of try
;Begin Epilogue
;~obj() inlined & tail called
Generated code for x64 No EH
push
sub
call
call
mov
test
jne
call
mov
test
jne
call
mov
$fail:
call
mov
add
pop
ret
rbx
rsp, 32
init
foo_err
ebx, eax
eax, eax
SHORT $fail
bar_err
ebx, eax
eax, eax
SHORT $fail
blah_err
ebx, eax
done
eax, ebx
rsp, 32
rbx
0
;Save nonvolatile register for result
;End Prologue
;Save return code
;Return code check
;Save return code
;Return code check
;Save return code
;Get return code
;Restore nonvolatile register
Costs of handling an exception
Disclaimer:
If you are really concerned about this, there is a
good chance you’re abusing or misusing
exceptions.
Exceptions are not to deal with standard
scenarios! Performance of exceptions is
generally stacked in favor of the nonexceptional case
There’s a reason the term is “exception”!
Costs of handling an exception:
X86 – Win32 – SEH & C++ EH
Without /SAFESEH (this is a big no-no – potential security hole)
O(n)
Walk a linked list of elements on [fs:0]
Invoke filters to determine handler
n is the number of frames on the stack with a protected region between throw & catch
C++ type check is just a special filter
Walk the list again, invoking __finally funclets & destructors
Finally, jump to __except block or call catch block
With /SAFESEH (this is good)
O(n log(m))
n is the number of frames on the stack with a protected region between throw & catch
m is the number of EH entry points in the entire program
For SEH, only 1. For C++ EH, one for each function!
Walk a linked list of elements of [fs:0]
For each element, verify the callback is in a list [O log(m)]
Invoke the filter to determine the handler
Walk the list again, invoking __finally’s, with callback verification [O log(m)]
Costs of handling an exception:
x64 – Win64 – SEH & C++ EH
O(n log(m))
Walk each function frame on the stack [O(n)]
Find it’s .pdata entry to get it’s unwind information [O(log(m))]
If it has a filter, call it to determine the handler
Restore nonvolatile registers as described in the unwind information
Once a handler has been determined
n is the number of functions on the stack between throw & catch (not just
the number with EH code in them!)
m is the number of distinct regions in the image [.pdata size]
Not just a function count – hot/cold sections and register allocation
regions can increase this pretty dramatically (1-4x)
Walk the stack again (using .pdata lookup)
Each frame that has cleanup code, invoke the finally’s or destructors
Jump to handler (or call catch)
Cost of handling an exception:
x86 – WoW64 – SEH & C++EH
There is some degree of thunking between
the 64 bit kernel and 32 bit subsystem, so
performance really varies.
Worst case, it’s as slow as x64 on Win64.
Best case it’s about the same as x86 on Win32.
If you use exception handling in performance
sensitive areas of code, you may notice a
difference in your application
If you do notice a difference, this should be a red
flag regarding your use of exceptions.
Final gotchas (non-standard C++!)
Some optimizations that are constrained inside of a try result
in observable differences, based on program structure,
compiler settings, and compiler implementation .
int g; // add a volatile to fix the problem
int *p;
void func1() {
g = 0;
__try {
g = 1;
*p = 0;
g = 2;
} __except(1) {
printf("%d\n", g);
}
}
void update() {
g = 1;
*p = 0;
g = 2;
}
void func2() {
g = 0;
__try {
update();
} __except(1) {
printf("%d\n", g);
}
}
Summary & Conclusions
Do not use exceptions for normal program flow.
Exception handling does have a performance cost
Not always measurable
Cost really depends on usage
Frequently similar to what correct code would be, without EH
[at least in VC8]
Do not use exceptions for normal program flow.
C++ is cheaper than SEH for cleanup in VC8.
Use common sense, and knowledge of your team’s
strengths/weaknesses if you’re mandating SEH/C++ EH/No EH
New hires rarely know about SEH.
Source level readability & visibility of performance
And finally, do not use exceptions for normal program flow.
More info
If you’re looking for detailed ABI docs for
X64, check my blog.
http://blogs.msdn.com/freik
Herb Sutter’s got some good books on using
exceptions with C++
He doesn’t give me kick backs