Visual C++ 2005 New Optimizations Ayman Shoukry Program Manager Visual C++ Microsoft Corporation How can your application run faster? ► Maximize optimization for each file. ► Whole Program.
Download
Report
Transcript Visual C++ 2005 New Optimizations Ayman Shoukry Program Manager Visual C++ Microsoft Corporation How can your application run faster? ► Maximize optimization for each file. ► Whole Program.
Visual C++ 2005 New
Optimizations
Ayman Shoukry
Program Manager
Visual C++
Microsoft Corporation
How can your application run faster?
► Maximize
optimization for each file.
► Whole Program Optimization (WPO) goes beyond
individual files.
► Profile Guided Optimization (PGO) specializes
optimizations specifically for your application.
► New Floating Point Model.
► OpenMP
► 64bit Code Generation.
Maximum Optimization for Each File
► Compiler
optimizes each source code file to get
best runtime performance
The only type optimization available in Visual C++ 6
► Visual
C++ 2005 has better optimization
algorithms
Specialized support for newer processors such as
Pentium 4
Improved speed and better precision of floating point
operations
New optimization techniques like loop unrolling
Whole Program Opitmization
►
►
Typically Visual C++ will optimize programs by generating
code for object files separately
Introducing whole program optimization
First introduced with Visual C++ 2002 and has since improved
Compiler and linker set with new options (/GL and /LTCG)
Compiler has freedom to do additional optimizations
► Cross-module
inlining
► Custom calling conventions
Visual C++ 2005 supports this on all platforms
Whole program optimizations is widely used for Microsoft products.
Profile Guided Optimization
Static analysis leaves many open optimization questions for
the compiler, leading to conservative optimizations
► Visual C++ programs can be tuned for expected user
scenarios by collecting information from running
application
► Introducing profile guided optimization
►
Optimizing code by using program in a way how its customer use it
Runs optimizations at link time like whole program optimization
Available in Visual Studio 2005
Is it common for p
Widely adopted in Microsoft
to be NULL?
If it is not common for
p to be NULL, the error
code should be
collected with other
infrequently used code
if (p != NULL) {
/* Perform action with p */
} else {
/* Error code */
}
PGO: Instrumentation
► We
instrument with “probes” inserted into the
code
► Two main types of probes
Value probes
► Used
to construct histogram of values
Count (simple/entry) probes
► Used
to count number of times a path is taken
► We
try to insert the minimum number of probes to
get full coverage
Minimizes the cost of instrumentation
PGO Optimizations
► Switch
expansion
► Better inlining decisions
► Cold code separation
► Virtual call speculation
► Partial inlining
Profile Guided Optimization
Source
Compile with /GL &
Optimizations On (e.g. /O2)
Object files
Object files
Link with /LTCG:PGI
Instrumented
Image
Scenarios
Instrumented
Image
Output
Profile data
Profile data
Link with /LTCG:PGO
Object files
Optimized
Image
PGO: Inlining Sample
► Profile
Guided uses call graph path
profiling.
a
foo
bat
bar
baz
PGO: Inlining Sample (Cont)
► Profile
Guided uses call graph path
profiling.
10
a
75
bar
20
foo
50
bar
100
bat
baz
baz
15
bar
15
baz
PGO – Inlining Sample (cont)
► Inlining
site.
decisions are made at each call
10
a
20
foo
125
bar
100
bat
baz
15
bar
baz
15
PGO – Switch Expansion
►
Most frequent values are pulled out.
// 90% of the
// time i = 10;
switch (i) {
case 1: …
case 2: …
case 3: …
default:…
}
if (i == 10)
goto default;
switch (i) {
case 1: …
case 2: …
case 3: …
default:…
}
PGO – Code Separation
Basic blocks are ordered so that most
frequent path falls through.
Default layout
A
100
Optimized layout
A
A
B
B
C
D
D
C
10
B
C
100
10
D
PGO – Virtual Call Speculation
The type of object A in function Func was almost always
Foo via the profiles
class Base{
…
virtual void call();
}
class Foo:Base{
…
void call();
}
class Bar:Base {
…
void call();
}
void Func(Base *A)
{
…
void
Bar(Base *A)
while(true)
{{
… …
while(true)
if(type(A) ==
{
Foo:Base)
…{
// inline of
A->call();
A->call();
…
} }
else
}
A->call();
…
}
}
PGO – Partial Inlining
Basic Block 1
Cond
Hot Code
Cold Code
More Code
PGO – Partial Inlining (cont)
Basic Block 1
Hot path is inlined,
but NOT the cold
Cond
Hot Code
Cold Code
More Code
Demo
Optimizing applications with
VC++ 2005
New Floating Point Model
► /Op
made your code run slow
No intermediate switch
► New
Floating Point Model
/fp:fast
/fp:precise (default)
/fp:strict
/fp:except
/fp:precise
►The
default floating point switch
►Performance and Precision
►IEEE Conformant
►Round to the appropriate precision
At assignments, casts and function calls
/fp:fast
► When
performance matters most
► You know your application does simple floating
point operations
► What can /fp:fast do?
Association
Distribution
Factoring inverse
Scalar reduction
Copy propagation
And others…
/fp:except
►Reliable
floating point exceptions
►Thrown and not thrown when expected
Faults and traps, when reliable, should
occur at the line that causes the
exception
FWAITs on x86 might be added
►Cannot
be used with /fp:fast and in
managed code
/fp:strict
►The
strictest FP option
Turns off contractions
Assumes floating point control word can
change or that the user will examine flags
►/fp:except
is implied
►Low double digit percent slowdown
versus /fp:fast
What is the output?
#include <stdio.h>
int main()
{
double x, y, z;
/fp:fast /O2 = 0.000
double sum;
x = 1e20;
/fp:strict /O2 = 10.0
y = -1e20;
z = 10.0;
sum = x + y + z;
printf ("sum=%f\n",sum);
}
OpenMP
A specification for writing multithreaded
programs
It consists of a set of simple #pragmas and
runtime routines
Makes it very easy to parallelize loop-based
code
Helps with load balancing, synchronization,
etc…
In Visual Studio, only available in C++
OpenMP Parallelization
►
►
Can parallelize loops and straight-line code
Includes synchronization constructs
void test(int first, int last) {
#pragma omp parallel for
for (int i = first; i <= last; ++i) {
a[i] = b[i] + c[i];
}
}
first = 1
last = 1000
1 ≤ i ≤ 250
251 ≤ i ≤ 500
501 ≤ i ≤ 750
751 ≤ i ≤ 1000
64bit Compiler in VC2005
► 64bit
Compiler Cross Tools
Compiler is 32bit but resulting image is 64bit
► 64bit
Compiler Native Tools
Compiler and resulting image are 64bit binaries.
► All
previous optimizations apply for 64bit as
well.
Resources
► Visual
C++ Dev Center
http://msdn.microsoft.com/visualc
This is the place to go for all our news and
whitepapers
Also VC2005 specific forums at
http://forums.microsoft.com
► Myself
http://blogs.msdn.com/aymans