Transcript Slide 1

Measurement based WCET Analysis for Multicore Architectures
Hardik Shah, Andrew Coombes, Andreas Raabe,
Kai Huang and Alois Knoll
Technische Universität München, RapitaSystems Ltd
and fortiss GmbH
Multi-cores in safety critical systems
 Safety critical semiconductor industry
o ~2% of the whole market
o Customized multi-cores an expensive solution
o Even a trivial change (change of an arbiter) could
be prohibitively expensive
o Unavailability/closed of cycle accurate simulator
for extracting “recorded trace”
7/16/2015
2
Goal
 Provide an inexpensive multi-core solution
which is WCET analyzable
o Use unmodified production chips
o Use unmodified measurement based WCET
analysis tool suitable of single-core architectures
 Preserve cost, performance and time-2market benefits
7/16/2015
3
Agenda
 Related work
 Background
 Worst case interference aware WCET analysis
o Optimizations
 Test cases
 Demo
 Conclusion
7/16/2015
5
Related work – I (WCET analysis)
 Static
o Abstract architecture and application models
 Hybrid measurement based [13 – Kirner et al]
o RapiTime (more in background section)
 Measurements in the presence of stress patterns
o Only valid under PD arbiter [25]
7/16/2015
6
Related work – II (WCET analysis)
 Others
o Real-time calculus [19 - Pellizzoni et al.]
o Model checking [15 – Lv et al.]
 Closest
o [16 – Nowotsch et al.] Monitoring and suspension
mechanism on shared resource usage
• Limited accesses in a unit time
 [15, 16, 19] Holistic approaches
7/16/2015
7
Related work – III (Tailored architectures)
 Time analyzable multi-cores
o MERASA [29], parMERASA [28]
 Repeatable time machines
o PRET [14], CoMPSoC [11]
 Probabilistic timing analysis
o PROARTIS [7], PROXIMA [12, 20]
7/16/2015
8
Background: Emulation devices
 Test chips with enhanced debug facilities
 Produced in low numbers and supplied to OEMs
before the production chips are sold
 Much cheaper to modify
7/16/2015
9
Background: Hybrid measurement based WCET analysis
Analyzed by
RapiTime
 On target measurements
 Complex architectures are analyzable
 Intrusive
7/16/2015
10
Background: Hybrid measurement based WCET analysis
 Used in RapiTime
timing analyzer
 Instrumentation points
o Time stamp trace
o ET profile
 Critical path detection
using MOETs of BBs
7/16/2015
11
Background: Round robin arbiter
 WLrr = N x SS, BLrr = SS (N – total number of masters)
 Experienced latency ϵ [BL, WL]
 RapiTime approach is invalid for multi-cores
7/16/2015
12
Worst case interference augmented tracing
 Adds a cache observer module in emulation device
of the production chip
 Occurrence time of cache misses and their
experienced latencies are saved in a trace
7/16/2015
13
Offline trace manipulation
Artificial delay in
occurrence
 Artificially inflates the MOET of BB by appending
each cache miss with WL
7/16/2015
14
WCET calculation from manipulated trace
Input to
RapiTime
Single core
worst case path
7/16/2015
15
Worst case interference augmented tracing
 Benefits
o Does not alter production chips
• Cost and performance benefits are preserved
o Unmodified single-core tools
o WCET of application under complex arbiters, e.g. CCSP
[4], PBS [27], can be measured
o Analysis in isolation (incremental certification)
7/16/2015
16
Worst case interference augmented tracing
 Drawbacks
o Additional master interface - lower operating frequency
o Trace size
7/16/2015
17
Optimized solution
 No master interface
 Only iPoint trace
o Same as single core measurements
7/16/2015
18
Optimized solution
 Benefits
o Simple architecture and ultra low area footprint
o No capacitive loading
• emulation device can run at same frequency as the
production device
 Drawbacks
o WCET under complex arbiters, e.g. CCSP, PBS, is high
due to the assumption of WL for each cache miss
7/16/2015
19
Overestimation of our approach
 Intrusive (same as single core technique)
o Measured WCET has impact of iPoint() executions
o iPoint() modifies cache state
• Deduction of iPoint() execution time is not enough !
o Impacts history based branch predictors
 Assumption of WL under RR is not pessimistic
o Highly interference vulnerable applications [26]
7/16/2015
20
Area overhead
Architecture
LEs
With cache observer
13555
Without the cache observer
14272
 Test architecture
o NIOS II-F Quad core processors, 512 (4K) I$, D$
o Optimized cache observer
o On-chip shared SRAM
o @ 125 MHz, Cyclone III FPGA
o 5% increase in area of emulation device (basic)
7/16/2015
21
Test results – 512 B I$ and D$
WCET
multi-core
WCET
single-core
Instrumentation
overhead
Cost of porting
from singlecore to multicore
 Multi-path applications from Mälardalen
Benchmark suit
7/16/2015
22
Test results – 4 KB I$ and D$
 avg(WCETni/WCETnis) reduces as cache size
increases due to the less number of cache misses
7/16/2015
23
Demo:
 www6.in.tum.de/Main/Shah
7/16/2015
24
Future extension
 Partitioned L2 cache
o Observe dedicated
partition as well as the
shared partition
o May not be considered
a COTS design
7/16/2015
25
Conclusion
 A novel technique to measure WCET of
applications on multi-core architectures
 Existing single-core analysis tools
 Unmodified production-chips
o Preserves cost and performance benefits
o Trivial addition to emulation chips is required
 Incremental certification
Thank you Questions?
7/16/2015
26