Chapter 1 -- Introduction
Download
Report
Transcript Chapter 1 -- Introduction
CSE 522
WCET Analysis
Computer Science & Engineering Department
Arizona State University
Tempe, AZ 85287
Dr. Yann-Hang Lee
[email protected]
(480) 727-7507
Some of the slides were based on the lecture by G. Fainekos (ASU)
Execution Time – WCET & BCET
(Figure from R.Wilhelm et al., ACM Trans. Embed. Comput. Sys, 2007.)
2
The WCET Problem
Given
the code for a software task
the platform (OS + hardware) that it will run on
Determine the WCET of the task.
Why is this problem important?
The WCET is central in the design of real-time computing
Can the WCET always be found?
In general, not a decidability problem, but a complexity problem
Compute bounds for the execution times of instructions and
basic blocks and determine a longest path in the basicblock graph of the program.
3
Components of Execution Time Analysis
Program path (Control flow) analysis
Want to find longest path through the program
Identify feasible paths through the program
Find loop bounds
Identify dependencies amongst different code fragments
Processor behavior analysis
For small code fragments (basic blocks), generate bounds on
run-times on the platform
Model details of architecture, including cache behavior,
pipeline stalls, branch prediction, etc.
Outputs of both analyses feed into each other
4
Program Path Analysis: Overall Approach (1)
Construct Control-Flow Graph (CFG) for the task
Nodes represent Basic Blocks of the task
Basic block: a sequence of consecutive program statements
where there is no possibility of branching
We have a single entry and a single exit node
Edges represent flow of control (jumps, branches, calls, …)
The problem is to identify the longest path in the CFG
Note: CFG can have loops, so need to infer loop bounds and
unroll them
This gives us a directed acyclic graph (DAG). How do we find
the longest path in this DAG?
5
Program Path Analysis: Overall Approach (2)
In a CFG
Bi = basic block i
xi = number of times the block Bi is executed
dj = number of times edge is executed
ci = worst case running time of block Bi
Objective: find
How to get xi?
Structural constraints
Functionality constraints
Loop bounds -- need to be known
6
CFG Example
d1
N = 10;
q = 0;
while(q < N)
q++;
q = r;
x2
d5
B1:
N = 10;
q = 0;
d2
d4
B2:
while(q<N)
0
B4:
q = r; x4
d6
x1
1
d3
B3:
q++;
Want to
maximize i ci xi
subject to constraints
x 1 = d1 = d2
d1 = 1
x2 = d2+d4 = d3+d5
x3 = d3 = d4 = 10
x 4 = d5 = d6
x3
Example due to Y.T. Li and S. Malik
7
CFG – Another example
/* k >=0 */
s = k;
while (k < 10){
if (ok)
j++;
else {
j = 0;
ok = true;
}
d9
k++;
}
r = j;
d1
x1
x2
B1 s = k;
d2
d8
B2 while (k < 10){
d3
x3
B3
if (ok)
d4
x4
B4
j++;
d6
x6
B6
k++;
x7
B7
r = j;
d5
B5 j = 0;
x5
ok = true;
d7
d10
8
Functionality Constraints
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
check_data() {
int i, morecheck, wrongone;
morecheck = 1; i = 0; wrongone = -1;
while (morecheck) {
if (data[i] < 0) {
wrongone = i; morecheck = 0;
}
else
if (++i >= 10)
morecheck = 0;
}
if (wrongone >= 0)
return 0;
else
return 1;
}
Constraints
x2 x4
x4 10x2
(x5 = 0 & x7 = 1) |
(x5 = 1 & x7 = 0)
x5 = x 9
9
Micro-architectural Modeling -- Cache
Modify cost function (cache hit and miss have different
costs)
Add linear constraints to describe relationship between
cache hits and misses
Basic idea
Basic blocks assumed to be smaller than entire cache
Subdivide instruction counts (xi) into counts of cache hits (xihit)
and misses (ximiss)
Line-block (or l-block) is a contiguous sequence of code within
the same basic block that is mapped to the same cache line in
the instruction cache
Either all hit or all miss in a l-block
10
Basic Blocks to Line Blocks (Directmapped cache)
B1.1
B1
B1.2
B1.3
B2.1
B2
B2.2
Color Cache Set
0
1
2
3
Cache Constraints:
xmiss
1
2.2
No conflicting l-blocks:
B3
B3.1
B3.2
(only the first execution has a miss)
Two nonconflicting l-blocks are mapped to
same cache line
x miss x miss 1
1.3
2.1
Conflicting blocks: affected by the sequence
11
Cache Conflict Graph
For every cache set containing two or more conflicting l-
blocks
start node, end node, and node Bk.l for every l-block in the cache set
Edge from Bk.l to Bm.n: control can pass between them without
passing through any other l-blocks of the same cache set.
p(i. j,u.v) : the number of times that the control passes through that edge.
start
p(k.l,k.l)
p(s,k.l)
Bk.l
p(s,m.n)
p(k.l,m.n
)
Bm.n
p(m.n,k.l
p(m.n,e)
)
p(k.l,e)
p(s,e)
p(m.n,m.n)
end
12
Cache Constraints Example (1)
d1
x1
Cache
x2
B1.1 s = k;
d2
d8
B2.1 while (k < 10){
d3
x3
B3.1 if (ok)
d4
x4
B4.1
j++;
d6
d9
k++;
x6
B6.1
x7
B7.1 r = j;
d5
B5.1j = 0;
x5
ok = true;
d7
d10
13
Cache Constraints Example (2)
𝑇𝑜𝑡𝑎𝑙 𝑒𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 =
S
p(s,4.1)
p(4.1,4.1)
𝑛𝑖 ℎ𝑖𝑡 ℎ𝑖𝑡
𝑗 (𝑐𝑖,𝑗 𝑥𝑖,𝑗
𝑚𝑖𝑠𝑠 𝑚𝑖𝑠𝑠
+ 𝑐𝑖,𝑗
𝑥𝑖,𝑗 )
p(4.1,5.1)
B4.1
𝑥𝑖 = 𝑢,𝑣 𝑝 𝑢,𝑣,𝑖,𝑗 = 𝑢,𝑣 𝑝 𝑖,𝑗,𝑢,𝑣
𝑢,𝑣 𝑝 𝑠,𝑢,𝑣 = 𝑢,𝑣 𝑝 𝑢,𝑣,𝑒 = 1
ℎ𝑖𝑡
𝑝(𝑖,𝑗,𝑖,𝑗) ≤ 𝑥𝑖,𝑗
≤ 𝑝 𝑠,𝑖,𝑗 + 𝑝(𝑖,𝑗,𝑖,𝑗)
B5.1
p(5.1,4.1)
p(s,e)
𝑁
𝑖
p(s,5.1)
p(5.1,5.1)
p(4.1,e)
S
p(5.1,e)
p(s,1.1)
E
p(1.1,6.1)
B1.1
B6.1
p(6.1,6.1)
p(1.1,e)
p(6.1,e)
E
14
over-estimation
cache-miss penalty
Progress During the Past 10 Years
The explosion of penalties has been
compensated by a reduction of uncertainties!
200
60
25
30-50%
25%
20-30%
15%
10%
4
1995
Lim et al.
2002
Thesing et al.
2005
Souyris et al.
15
Open Problems
Architectures are getting much more complex.
Can we create processor behavior models without the pain?
Can we change the architecture to make timing analysis easier?
Small changes to code and/or architecture require
completely re-doing the WCET computation
Use robust techniques that learn about processor/platform
behavior
Need more reliable ways to measure execution time
References:
Li, Malik, and Wolfe, “Cache Modeling for Real-Time Software:
Beyond Direct Mapped Instruction Caches”
Wilhelm, “Determining bounds on execution times,” Handbook on
Embedded Systems, CRC Press, 2005
16