Chapter 1 -- Introduction

Download Report

Transcript Chapter 1 -- Introduction

CSE 522
WCET Analysis
Computer Science & Engineering Department
Arizona State University
Tempe, AZ 85287
Dr. Yann-Hang Lee
[email protected]
(480) 727-7507
Some of the slides were based on the lecture by G. Fainekos (ASU)
Execution Time – WCET & BCET
(Figure from R.Wilhelm et al., ACM Trans. Embed. Comput. Sys, 2007.)
2
The WCET Problem
 Given
 the code for a software task
 the platform (OS + hardware) that it will run on
 Determine the WCET of the task.
 Why is this problem important?

The WCET is central in the design of real-time computing
 Can the WCET always be found?
 In general, not a decidability problem, but a complexity problem
 Compute bounds for the execution times of instructions and
basic blocks and determine a longest path in the basicblock graph of the program.

3
Components of Execution Time Analysis
 Program path (Control flow) analysis
 Want to find longest path through the program
 Identify feasible paths through the program
 Find loop bounds
 Identify dependencies amongst different code fragments
 Processor behavior analysis
 For small code fragments (basic blocks), generate bounds on
run-times on the platform
 Model details of architecture, including cache behavior,
pipeline stalls, branch prediction, etc.

Outputs of both analyses feed into each other
4
Program Path Analysis: Overall Approach (1)
 Construct Control-Flow Graph (CFG) for the task
 Nodes represent Basic Blocks of the task
 Basic block: a sequence of consecutive program statements
where there is no possibility of branching
 We have a single entry and a single exit node
 Edges represent flow of control (jumps, branches, calls, …)
 The problem is to identify the longest path in the CFG
 Note: CFG can have loops, so need to infer loop bounds and
unroll them
 This gives us a directed acyclic graph (DAG). How do we find
the longest path in this DAG?
5
Program Path Analysis: Overall Approach (2)
 In a CFG
 Bi = basic block i
 xi = number of times the block Bi is executed
 dj = number of times edge is executed
 ci = worst case running time of block Bi
 Objective: find
 How to get xi?
 Structural constraints
 Functionality constraints
 Loop bounds -- need to be known
6
CFG Example
d1
N = 10;
q = 0;
while(q < N)
q++;
q = r;
x2
d5
B1:
N = 10;
q = 0;
d2
d4
B2:
while(q<N)
0
B4:
q = r; x4
d6
x1
1
d3
B3:
q++;
Want to
maximize i ci xi
subject to constraints
x 1 = d1 = d2
d1 = 1
x2 = d2+d4 = d3+d5
x3 = d3 = d4 = 10
x 4 = d5 = d6
x3
Example due to Y.T. Li and S. Malik
7
CFG – Another example
/* k >=0 */
s = k;
while (k < 10){
if (ok)
j++;
else {
j = 0;
ok = true;
}
d9
k++;
}
r = j;
d1
x1
x2
B1 s = k;
d2
d8
B2 while (k < 10){
d3
x3
B3
if (ok)
d4
x4
B4
j++;
d6
x6
B6
k++;
x7
B7
r = j;
d5
B5 j = 0;
x5
ok = true;
d7
d10
8
Functionality Constraints
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
check_data() {
int i, morecheck, wrongone;
morecheck = 1; i = 0; wrongone = -1;
while (morecheck) {
if (data[i] < 0) {
wrongone = i; morecheck = 0;
}
else
if (++i >= 10)
morecheck = 0;
}
if (wrongone >= 0)
return 0;
else
return 1;
}
Constraints
x2  x4
x4  10x2
(x5 = 0 & x7 = 1) |
(x5 = 1 & x7 = 0)
x5 = x 9
9
Micro-architectural Modeling -- Cache
 Modify cost function (cache hit and miss have different
costs)
 Add linear constraints to describe relationship between
cache hits and misses
 Basic idea
 Basic blocks assumed to be smaller than entire cache
 Subdivide instruction counts (xi) into counts of cache hits (xihit)
and misses (ximiss)
 Line-block (or l-block) is a contiguous sequence of code within
the same basic block that is mapped to the same cache line in
the instruction cache
 Either all hit or all miss in a l-block
10
Basic Blocks to Line Blocks (Directmapped cache)
B1.1
B1
B1.2
B1.3
B2.1
B2
B2.2
Color Cache Set
0
1
2
3
Cache Constraints:
xmiss
1
2.2
No conflicting l-blocks:
B3
B3.1
B3.2
(only the first execution has a miss)
Two nonconflicting l-blocks are mapped to
same cache line
x miss  x miss  1
1.3
2.1
Conflicting blocks: affected by the sequence
11
Cache Conflict Graph
 For every cache set containing two or more conflicting l-
blocks
 start node, end node, and node Bk.l for every l-block in the cache set
 Edge from Bk.l to Bm.n: control can pass between them without
passing through any other l-blocks of the same cache set.
 p(i. j,u.v) : the number of times that the control passes through that edge.
start
p(k.l,k.l)
p(s,k.l)
Bk.l
p(s,m.n)
p(k.l,m.n
)
Bm.n
p(m.n,k.l
p(m.n,e)
)
p(k.l,e)
p(s,e)
p(m.n,m.n)
end
12
Cache Constraints Example (1)
d1
x1
Cache
x2
B1.1 s = k;
d2
d8
B2.1 while (k < 10){
d3
x3
B3.1 if (ok)
d4
x4
B4.1
j++;
d6
d9
k++;
x6
B6.1
x7
B7.1 r = j;
d5
B5.1j = 0;
x5
ok = true;
d7
d10
13
Cache Constraints Example (2)
𝑇𝑜𝑡𝑎𝑙 𝑒𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 =
S
p(s,4.1)
p(4.1,4.1)
𝑛𝑖 ℎ𝑖𝑡 ℎ𝑖𝑡
𝑗 (𝑐𝑖,𝑗 𝑥𝑖,𝑗
𝑚𝑖𝑠𝑠 𝑚𝑖𝑠𝑠
+ 𝑐𝑖,𝑗
𝑥𝑖,𝑗 )
p(4.1,5.1)
B4.1
𝑥𝑖 = 𝑢,𝑣 𝑝 𝑢,𝑣,𝑖,𝑗 = 𝑢,𝑣 𝑝 𝑖,𝑗,𝑢,𝑣
𝑢,𝑣 𝑝 𝑠,𝑢,𝑣 = 𝑢,𝑣 𝑝 𝑢,𝑣,𝑒 = 1
ℎ𝑖𝑡
𝑝(𝑖,𝑗,𝑖,𝑗) ≤ 𝑥𝑖,𝑗
≤ 𝑝 𝑠,𝑖,𝑗 + 𝑝(𝑖,𝑗,𝑖,𝑗)
B5.1
p(5.1,4.1)
p(s,e)
𝑁
𝑖
p(s,5.1)
p(5.1,5.1)
p(4.1,e)
S
p(5.1,e)
p(s,1.1)
E
p(1.1,6.1)
B1.1
B6.1
p(6.1,6.1)
p(1.1,e)
p(6.1,e)
E
14
over-estimation
cache-miss penalty
Progress During the Past 10 Years
The explosion of penalties has been
compensated by a reduction of uncertainties!
200
60
25
30-50%
25%
20-30%
15%
10%
4
1995
Lim et al.
2002
Thesing et al.
2005
Souyris et al.
15
Open Problems
 Architectures are getting much more complex.
 Can we create processor behavior models without the pain?
 Can we change the architecture to make timing analysis easier?
 Small changes to code and/or architecture require
completely re-doing the WCET computation
 Use robust techniques that learn about processor/platform
behavior
 Need more reliable ways to measure execution time
 References:
 Li, Malik, and Wolfe, “Cache Modeling for Real-Time Software:
Beyond Direct Mapped Instruction Caches”
 Wilhelm, “Determining bounds on execution times,” Handbook on
Embedded Systems, CRC Press, 2005
16