chapter5.pptx

Download Report

Transcript chapter5.pptx

Chapter 5 Optimizing
Program Performance
Guobao Jiang (蒋国宝)
[email protected]
[email protected]
Problem 5.1 (P381)
• The following problem illustrates the way
memory aliasing (存储器别名使用) can cause
unexpected program behavior. Consider the
following procedure to swap two values:
void swap(int *xp, int *yp)
{
*xp = *xp + *yp;
/* x+y */
*yp = *xp - *yp;
/* x+y-y = x */
*xp = *xp - *yp;
/* x+y-x = y */
}
If this procedure is called with xp equal to yp,
what effect will it have ?
2016/8/6
2
Problem 5.2 (P384)
• Later in this chapter we will take a single
function and generate many different variants
that preserve the function’s behavior, but with
different performance characteristics. For
three of these variants, we found that the run
times (in clock cycles) can be approximated by
the following functions:
• Version 1 60 + 35n
• Version 2 136 + 4n
• Version 3 157 + 1.25n
For what values of n would each version be the
fastest of the three ? Remember that n will
always be an integer.
2016/8/6
3
Problem 5.3 (P391)
• Consider the following functions:
int min(int x, int y) {return x < y ? x:y;}
int max(int x, int y){return x < y ? y:x;}
void incr(int *xp, int v){ *xp += v;}
int square(int x){return x*x;}
• The following three code fragments call
these functions:
2016/8/6
4
Problem 5.3+ (P391)
A. for(i = min(x, y); i<max(x, y); incr(&i,1))
t += square(i);
B. for(i = max(x, y)-1;i>=min(x, y);incr(&i,-1))
t += square(i);
C. int low = min(x, y);
int high = max(x, y);
for (i = low; i < high; incr(&i, 1))
t+= square(i);
2016/8/6
5
Problem 5.3+ (P391)
A. for(i = min(x, y); i<max(x, y); incr(&i,1))
t += square(i);
C. int low = min(x, y); int high = max(x, y);
for (i = low; i < high; incr(&i, 1))
t+= square(i);
Assume x=10 and y=100. Fill in the table:
Code
min
max
incr
A.
1
91
90
90
B.
91
1
90
90
C.
1
1
2016/8/6
90
square
90
6
Problem 5.4 (P415)
• At times, GCC does its own … • Question:
.L6
Write C code for a
addl (%eax), %edx
procedure combine5px8
addl 4(%eax),%edx
that shows how pointers,
addl 8(%eax),%edx
loop variables, and
addl 12(%eax),%edx
termination conditions are
addl 16(%eax),%edx
being computed by this
addl 20(%eax),%edx
code. Show the general
addl 24(%eax),%edx
form with arbitrary data
addl 28(%eax),%edx
and combining operation in
addl $32,%eax
the style of Figure 5.19
addl $ 8,%ecx
(P392). Describe how it
cmpl %esi, %ecx
differs form our
jl .L6
handwritten pointer code
(Figure 5.22).
2016/8/6
7
Problem 5.5 (P421)
• The following shows the code generated
from a variant of combine6(P416) that uses
eight-way loop unrolling and four-way
parallelism.
.L152
addl (%eax), %ecx
addl 4(%eax), %esi
addl 8(%eax), %edi
• Questions:
addl 12(%eax), %ebx
A. What program variable has
addl 16(%eax), %ecx
being spilled onto the stack?
addl 20(%eax), %esi
addl 24(%eax), %edi
B. At what location on the
addl 28(%eax), %ebx
stack?
addl $32,%eax
C. Why is this a good choice
addl $ 8,%edx
cmpl -8(%ebp), %edx
of which value to spill ?
jl .L152
2016/8/6
8
Problem 5.6 (P422)
• Consider the following function for computing the product
of an array of n integers. We have unrolled the loop by a
factor of 3.
int aprod(int a[], int n)
{
int i, x, y, z;
int r = 1;
for (i = 0; i < n-2; i += 3 ){
x = a[i]; y=a[i+1]; z=a[i+2];
r = r*x*y*z;
/*Product computation*/
}
for (; i < n; i++)
r *= a[i];
return r;
}
2016/8/6
9
Problem 5.6+ (P422)
• For the line labeled Product computation, we
can use parentheses to create five different
associations of the computation, as follows:
r = ((r * x) * y) * z; /* A1 */
r = (r * (x * y)) * z; /* A2 */
r = r * ((x * y) * z); /* A3 */
r = r * (x * (y * z)); /* A4 */
r = (r * x) * (y * z); /* A5 */
• Recall from Figure 5.12 that the integer
multiplication operation on this machine has a
latency of 4 cycles and an issue time of 1 cycle.
2016/8/6
10
Problem 5.6+ (P422)
• The table that follows shows some
values of the CPE and other values
missing. Fill in the missing entries.
Version
A1
A2
A3
A4
A5
2016/8/6
Measured CPE Theoretical CPE
4.00
12/3 = 4
8/3 = 2.67
2.67
1.67
4/3 = 1.33
1.67
2.67
4/3 = 1.33
8/3 = 2.67
12
Problem 5.7 (P428)
• A friend of yours has written …
int deref(int *xp)
{
return xp ? *xp : 0;
}
The compiler generates the following code for the
body of the procedure.
movl 8(%ebp), %edx
Get xp
movl (%edx), %eax
Get *xp as result
testl %edx, %edx
Test xp
cmovzl %edx, %eax
If 0, copy 0 to result
Explain why this code does not provide a valid
implementation of deref.
2016/8/6
13
Problem 5.8 (P436)
• As another example of code with potential loadstore interactions, consider the following
function to copy the contents of one array to
another:
void copy_array(int *src, int *dest, int n)
{
int i;
for (i = 0; i < n; i++)
dest[i] = src[i];
}
Suppose a is an array of length 1000 initialized
so that each element a[i] equals i.
2016/8/6
14
Problem 5.8+ (P436)
• A. What would be the effect of the call
copy_array(a+1, a, 999) ?
• B. What would be effect of the call
copy_array(a, a+1, 999) ?
• C. Our performance measurements indicate
that the call of part A has a CPE of 3.00,
while the call of part B has a CPE of 5.00.
To what factor do you attribute this
performance difference ?
• D. What performance would you expect for
the call copy_array(a, a, 999) ?
2016/8/6
15
Problem 5.9 (P443)
• Suppose you work as a truck driver, and you have
been hired to carry a load of potatoes from Boise,
Idaho to Minneapolis, Minnesota, a total distance of
2500 kilometers. You estimate you can average 100
km/hr driving within the speed limits, requiring a
total of 25 hours for the trip.
• A. You hear on the news that Montana has just
abolished its speed limit, which constitutes 1500
km of the trip. Your truck can travel at 150 km/hr.
What will be your speedup for the trip ?
• B. You can buy a new turbocharger for your truck
at www.fasttrucks.com. They stock a variety of
models, but the faster you want to go, the more it
will cost. How fast must you travel through
Montana to get an overall speedup for your trip of
5/3 ?
2016/8/6
16
Problem 5.10 (P444)
• The marketing department at your
company has promised your customers that
the next software release will show a 2X
performance improvement. You have been
assigned the task of delivering on that
promise. You have determined that only
80% of the system can be improved. How
much (i.e., what value of k) would you need
to improve this part to meet the overall
performance target ?
2016/8/6
17
Summary
• 1. Optimization blocker (妨碍优化的因素)
A. memory aliasing
B. function call
• 2. Performance improvement techniques
A. High-level design
algorithms and data structures
B. Basic coding principles
eliminate excessive function calls
eliminate unnecessary memory references
C. Low-level optimizations
pointer versus array code
reduce loop overhead by unrolling loops
make use of the pipelined functional units by
iteration splitting (迭代分割)
2016/8/6
18
Assignments
• 5.15 (P448)
• 5.17 (P448)
• 5.19 (P450)
• Notes: Due Next Monday (May 28, 2012)
• This slides will be uploaded to ftp:10.141.247.12
2016/8/6
19
Q&A ?
Thank you!
2016/8/6
20
website
• http://jpkc.fudan.edu.cn/s/258/main.htm
• http://10.108.0.74/s/258/main.jspy
• ftp: 10.141.247.12 usr:ics2012 pwd:ics2012