下載:程序的循環優化方法(PPT)
Download
Report
Transcript 下載:程序的循環優化方法(PPT)
提升循环级并行
陈健
2002/11
Copyright © 2002 Intel Corporation
Agenda
Introduction
Who Cares?
Definition
Loop Dependence and Removal
Dependency Identification Lab
Summary
Introduction
Loops must meet certain criteria…
–Iteration Independence
–Memory Disambiguation
–High Loop Count
–Etc…
Who Cares
实现真正的并行:
– OpenMP
– Auto Parallelization…
显式的指令级并行 ILP (Instruction Level
Parallelism)
–
–
–
–
Streaming SIMD (MMX, SSE, SSE2, …)
Software Pipelining on Intel® Itanium™ Processor
Remove Dependencies for the Out-of-Order Core
More Instructions run in parallel on Intel ItaniumProcessor
自动编译器并行
– High Level Optimizations
Definition
Loop Independence:
int a[MAX];
Iteration Y of a loop is
independent of when for (J=0;J<MAX;J++) {
a[J] = b[J];
or whether iteration X
}
happens
图例
OpenMP
OpenMP: True Parallelism
SIMD
SIMD: Vectorization
SWP
SWP: Software Pipelining
OOO
OOO: Out-of-Order Core
ILP
ILP: Instruction Level Parallelism
Green: Benefits from concept
Yellow: Some Benefits from Concept
Red: No Benefit from Concept
OpenMP
SIMD
SWP
OOO
ILP
Agenda
Definition
Who Cares?
Loop Dependence and Removal
–Data Dependencies
–Removing Dependencies
–Data Ambiguity and the Compiler
Dependency Removal Lab
Summary
Flow Dependency
Read After Write
Cross-Iteration
Flow Dependence:
Variables written
then read in
different iterations
for (J=1; J<MAX; J++) {
A[J]=A[J-1];
}
A[1]=A[0];
A[2]=A[1];
OpenMP
SIMD
SWP
OOO
ILP
Anti-Dependency
Write After Read
Cross-Iteration
Anti-Dependence:
Variables written
then read in
different iterations
for (J=1; J<MAX; J++) {
A[J]=A[J+1];
}
A[1]=A[2];
A[2]=A[3];
OpenMP
SIMD
SWP
OOO
ILP
Output Dependency
Write After Write
Cross-Iteration
Output
Dependence:
Variables written
then written again
in a different
iteration
for (J=1; J<MAX; J++) {
A[J]=B[J];
A[J+1]=C[J];
}
A[1]=B[1];
A[2]=C[1];
A[2]=B[1];
A[3]=C[1];
OpenMP
SIMD
SWP
OOO
ILP
IntraIteration Dependency
Dependency within
an iteration
Hurts ILP
May be
automatically
removed by
compiler
K = 1;
for (J=1; J<MAX; J++) {
A[J]=A[J] + 1;
B[K]=A[K] + 1;
K = K + 2;
}
A[1] = A[1] + 1;
B[1]= A[1] + 1;
OpenMP
SIMD
SWP
OOO
ILP
Remove Dependencies
Best Choice
Requirement for
true Parallelism
Not all
dependencies can
be removed
for (J=1; J<MAX; J++) {
A[J]=A[J-1] + 1;
}
for (J=1; J<MAX; J++) {
A[J]= A[0] + J;
}
OpenMP
SIMD
SWP
OOO
ILP
Increasing ILP, without
removing dependencies
Good: Unroll Loop
Make sure the
compiler can’t or
didn’t do this for you
Compiler should not
apply common subexpression elimination
Also notice that if this
is floating point data precision could be
altered
for (J=1;J<MAX;J++) {
A[J] =A[J-1] + B[J];
}
for (J=1;J<MAX;J+=2) {
A[J]=A[J-1] + B[J];
A[J+1]=A[J-1] + (B[J]
+ B[J+1]);
}
OpenMP
SIMD
SWP
OOO
ILP
Induction Variables
Induction variables
are incremented on
each trip through the
loop
Fix by replacing
increment
expressions with
pure function of loop
index
i1 = 0;
i2 = 0;
for(J=0,J<MAX,J++) {
i1 = i1 + 1;
B(i1) = …
i2 = i2 + J;
A(i2) = …
}
for(J=0,J<MAX,J++) {
B(J) = ...
A((J**2 + J)/2)= ...
}
OpenMP
SIMD
SWP
OOO
ILP
Reductions
Reductions collapse
array data to scalar
data via associative
operations:
for (J=0; J<MAX; J++)
sum = sum + c[J];
Take advantage of
associativity and
compute partial
sums or local
maximum in private
storage
Next, combine
partial results into
shared result,
taking care to
synchronize access
OpenMP
SIMD
SWP
OOO
ILP
Data Ambiguity and the
Compiler
Are the loop
iterations
independent?
The C++ compiler has
no idea
void func(int *a, int *b) {
for (J=0;J<MAX;J++) {
a[J] = b[J];
}
}
No chance for
optimization - In order
to run error free the
compiler assumes
that a and b overlap
OpenMP
SIMD
SWP
OOO
ILP
Function Calls
Generally
function calls
inhibit ILP
Exceptions:
for (J=0;J<MAX;J++) {
compute(a[J],b[J]);
a[J][1]=sin(b[J]);
}
–Transcendentals
–IPO compiles
OpenMP
SIMD
SWP
OOO
ILP
Function Calls with State
Many routines
maintain state
across calls:
– Memory allocation
– Pseudo-random
number generators
– I/O routines
– Graphics libraries
– Third-party libraries
Parallel access to
such routines is
unsafe unless
synchronized
Check
documentation for
specific functions
to determine
thread-safety
OpenMP
SIMD
SWP
OOO
ILP
A Simple Test
1. Reverse the loop
order and rerun in
serial
*Exception: Loops with
induction variables
Reverse
2. If results are
unchanged, the loop
is Independent*
for(J=0;J<MAX;J++) {
<...>
compute(J,...)
<...>
}
for(J=MAX-1;J>=0;J--){
<...>
compute(J,...)
<...>
}
Summary
Loop Independence: Loop Iterations are
independent of each other.
Explained it’s importance
– ILP and Parallelism
Identified common causes of loop
dependence
– Flow Dependency, Anti-Dependency, Output
Dependency
Taught some methods of fixing loop
dependence
Reinforced concepts through lab