A Unified WCET Analysis Framework for Multi-core Platforms Sudipta Chattopadhyay, Chong Lee Kee, Abhik Roychoudhury National University of Singapore Timon Kelter, Peter Marwedel TU Dortmund, Germany Heiko.

Download Report

Transcript A Unified WCET Analysis Framework for Multi-core Platforms Sudipta Chattopadhyay, Chong Lee Kee, Abhik Roychoudhury National University of Singapore Timon Kelter, Peter Marwedel TU Dortmund, Germany Heiko.

A Unified WCET Analysis
Framework for Multi-core
Platforms
Sudipta Chattopadhyay, Chong Lee Kee, Abhik Roychoudhury
National University of Singapore
Timon Kelter, Peter Marwedel
TU Dortmund, Germany
1
Heiko Falk
Ulm University, Germany
RTAS 2012, Beijing
Timing Analysis
Hard real time systems require absolute timing
guarantees



System level analysis
Single task analysis
Worst case execution time (WCET) analysis



2
An upper bound on execution time for all possible inputs
Sound over-approximation is obtained by static analysis
RTAS 2012, Beijing
WCET Analysis
WCET
of basic
blocks
Infeasible
path
constraints
Program
Micro-architectural
modeling
IPET
Loop
bound
Control
flow
graph
constraints
IPET = Implicit Path Enumeration Technique
3
Path analysis
RTAS 2012, Beijing
Architecture
Core 1
Core n
L1 cache
L1 cache
Shared bus
Shared L2 cache
Memory
4
RTAS 2012, Beijing
Micro-architectural Modeling
Li et. al RTSS’09
branch
predictor
shared cache
Chattopadhyay et. al
SCOPES’10
Interactions
Kelter et. al ECRTS’11
cache
pipeline
shared bus
Rosen et. al RTSS’07
Single Core
Multi Core
Unified Multi-core timing analysis
5
RTAS 2012, Beijing
Timing Anomaly (shared Cache)
hit
hit
hit
miss
miss
miss
hit
miss
miss
hit
hit
miss
hit
miss
miss
hit
May not be the worst case path
6
RTAS 2012, Beijing
Timing Anomaly (Shared Bus)
delaymax
delaymin
delaymax
delaymin
delaymin
delaymax
May not be the worst case path
7
RTAS 2012, Beijing
Background
Representing each pipeline stage as a timing interval

start
[3,7]
[1,3]
latency
finish
[4,10]
IF
ID
EX
WB
CM
IF
ID
EX
WB
CM
R1 := R2 + 5
Structural
dependency
IF
ID
EX
WB
CM
R5 := R1 * R7
IF
ID
EX
WB
CM
IF
ID
EX
WB
CM
Contention
R3 := R5 * 5
A fixed-point analysis derives the timing of each stage as an interval
8
RTAS 2012, Beijing
Shared Cache + Pipeline
Abstract interpretation – hit, miss or unclear
Timing interval
L1
miss
unclear
hit
T := T + [1, 1]
T := T + [ miss1 + 1, miss1 + 1]
(shared)
L2
hit
unclear
T := T + [miss1 + 1, miss1 + miss2 +
1]
T := T + [1, miss1 + miss2 +
1]
hit latency = 1 cycle
miss1 L1 cache miss penalty
miss2 L2 cache miss penalty
9
RTAS 2012, Beijing
Shared Bus Analysis


Time Division Multiple Access (TDMA)
Offset abstraction
Core 0
Core 0
offset delay
round
T
(core 1)
10
Core 1
Core 1
Core 0
Core 0
delay = 0
offset
round
T’
(core 0)
RTAS 2012, Beijing
Core 1
Core 1
Shared bus + pipeline
IF1
ID1
IF2
ID2
O1
Oin
O2
IF3
ID3
IF2 finishes after ID1ID1 finishes after IF2
ID1  IF2
IF2  ID1
Oin = O1
Oin = O2
(approximate timing
by static analysis)
IF2  ID1
Oin = O1 U O2
Property: Offset content monotonically decreases over different iterations
11
RTAS 2012, Beijing
Loop Construct
Ci = bus context of the loop body at i-th iteration
Bus contexts
C1
C2
C3
……
C100
Unrolling loop iterations
EXPENSIVE
12
RTAS 2012, Beijing
Loop Construct
Bus context flow graph
C1
C2
C3
C4
How do we define bus context?
C5
C5  C3
Property: If Ci  Cj, then Ci+k  Cj+k for any k > 0
13
RTAS 2012, Beijing
Loop Construct
C1
Bus context flow graph
C2
C3
C4
Bus offsets of all pipeline stages
of all instructions?
There could be thousands of nodes
How do we define bus context?
14
RTAS 2012, Beijing
Loop Construct
previous
iteration
current
iteration
IF
ID
EX
WB
CM
IF
ID
EX
WB
CM
IF
ID
EX
WB
CM
IF
ID
EX
WB
CM
How do we define bus context?
Property: If the bus offsets of the cross-iteration edges do not change,
WCET of the loop iteration cannot change
15
RTAS 2012, Beijing
Loop Construct
C1
Bus context flow graph
C2
C3
C4
Compute WCET for each bus context
Generate ILP flow constraints:
E(C1) + E(C2) + E(C3) + E(C4) ≤ loop bound
E(C1) ≥ E(C2)
E(C1) = number of times context C1 is
executed
16
RTAS 2012, Beijing
Branch prediction + Cache
m
Cache conflict
Cache hit
Cache miss
m
m’
m evicted from cache
branch correctly
predicted
branch incorrectly
predicted
17
RTAS 2012, Beijing
Branch prediction + Cache
Cache
content
JOIN
m
Branch location
m
Maximum number of
speculated instructions
m’
Unclear
cache
access
18
Cache
content
RTAS 2012, Beijing
Overall Picture
branch
predictor
cache
pipeline
shared
cache
shared bus
WCET
of basic
blocks
IPET
Loop
bound
Multi Core
Bus context
constraints
19
Infeasible
path
constrain
s
constraints
Path analysis
RTAS 2012, Beijing
Experimental Setup (Chronos Toolkit)
GCC
simplescalar
C source
Micro
architectural
modeling
Private
cache
Shared cache
pipeline
Binary code
Flow
constraints
Branch
prediction
Shared bus
Micro-architectural
constraints
20
CFG
RTAS 2012, Beijing
ILP
WCET
Cache Sharing vs Cache Partitioning
4
4
4
Core 1
8
8
Core 1 Core 2
8
Core 2
Shared Cache
between 2 cores
21
Vertically partition
Horizontally partition
RTAS 2012, Beijing
Evaluation (cache + pipeline)
Imprecision of shared
cache analysis
jfdctint
22
statemate
RTAS 2012, Beijing
Evaluation (Cache + pipeline + Speculation)
Imprecision of modeling
speculation
23
RTAS 2012, Beijing
Evaluation (Bus + pipeline)
Imprecision of path
analysis
24
Imprecision of shared
bus analysis
RTAS 2012, Beijing
Evaluation (Bus + pipeline + Speculation)
Imprecision of path
analysis
25
Imprecision of shared
bus analysis
RTAS 2012, Beijing
Conclusion


A unified WCET analysis framework

Handles interaction of shared cache and bus with
pipeline and branch prediction

Timing anomaly is possible, state explosion is handled by
timing interval abstraction
Detailed information of the tool and extensive
results are available at:

26
http://www.comp.nus.edu.sg/~rpembed/chronos-multi-core.html
RTAS 2012, Beijing
Questions
Thank You
27
RTAS 2012, Beijing