Transcript Slide 1
Measurement based WCET Analysis for Multicore Architectures Hardik Shah, Andrew Coombes, Andreas Raabe, Kai Huang and Alois Knoll Technische Universität München, RapitaSystems Ltd and fortiss GmbH Multi-cores in safety critical systems Safety critical semiconductor industry o ~2% of the whole market o Customized multi-cores an expensive solution o Even a trivial change (change of an arbiter) could be prohibitively expensive o Unavailability/closed of cycle accurate simulator for extracting “recorded trace” 7/16/2015 2 Goal Provide an inexpensive multi-core solution which is WCET analyzable o Use unmodified production chips o Use unmodified measurement based WCET analysis tool suitable of single-core architectures Preserve cost, performance and time-2market benefits 7/16/2015 3 Agenda Related work Background Worst case interference aware WCET analysis o Optimizations Test cases Demo Conclusion 7/16/2015 5 Related work – I (WCET analysis) Static o Abstract architecture and application models Hybrid measurement based [13 – Kirner et al] o RapiTime (more in background section) Measurements in the presence of stress patterns o Only valid under PD arbiter [25] 7/16/2015 6 Related work – II (WCET analysis) Others o Real-time calculus [19 - Pellizzoni et al.] o Model checking [15 – Lv et al.] Closest o [16 – Nowotsch et al.] Monitoring and suspension mechanism on shared resource usage • Limited accesses in a unit time [15, 16, 19] Holistic approaches 7/16/2015 7 Related work – III (Tailored architectures) Time analyzable multi-cores o MERASA [29], parMERASA [28] Repeatable time machines o PRET [14], CoMPSoC [11] Probabilistic timing analysis o PROARTIS [7], PROXIMA [12, 20] 7/16/2015 8 Background: Emulation devices Test chips with enhanced debug facilities Produced in low numbers and supplied to OEMs before the production chips are sold Much cheaper to modify 7/16/2015 9 Background: Hybrid measurement based WCET analysis Analyzed by RapiTime On target measurements Complex architectures are analyzable Intrusive 7/16/2015 10 Background: Hybrid measurement based WCET analysis Used in RapiTime timing analyzer Instrumentation points o Time stamp trace o ET profile Critical path detection using MOETs of BBs 7/16/2015 11 Background: Round robin arbiter WLrr = N x SS, BLrr = SS (N – total number of masters) Experienced latency ϵ [BL, WL] RapiTime approach is invalid for multi-cores 7/16/2015 12 Worst case interference augmented tracing Adds a cache observer module in emulation device of the production chip Occurrence time of cache misses and their experienced latencies are saved in a trace 7/16/2015 13 Offline trace manipulation Artificial delay in occurrence Artificially inflates the MOET of BB by appending each cache miss with WL 7/16/2015 14 WCET calculation from manipulated trace Input to RapiTime Single core worst case path 7/16/2015 15 Worst case interference augmented tracing Benefits o Does not alter production chips • Cost and performance benefits are preserved o Unmodified single-core tools o WCET of application under complex arbiters, e.g. CCSP [4], PBS [27], can be measured o Analysis in isolation (incremental certification) 7/16/2015 16 Worst case interference augmented tracing Drawbacks o Additional master interface - lower operating frequency o Trace size 7/16/2015 17 Optimized solution No master interface Only iPoint trace o Same as single core measurements 7/16/2015 18 Optimized solution Benefits o Simple architecture and ultra low area footprint o No capacitive loading • emulation device can run at same frequency as the production device Drawbacks o WCET under complex arbiters, e.g. CCSP, PBS, is high due to the assumption of WL for each cache miss 7/16/2015 19 Overestimation of our approach Intrusive (same as single core technique) o Measured WCET has impact of iPoint() executions o iPoint() modifies cache state • Deduction of iPoint() execution time is not enough ! o Impacts history based branch predictors Assumption of WL under RR is not pessimistic o Highly interference vulnerable applications [26] 7/16/2015 20 Area overhead Architecture LEs With cache observer 13555 Without the cache observer 14272 Test architecture o NIOS II-F Quad core processors, 512 (4K) I$, D$ o Optimized cache observer o On-chip shared SRAM o @ 125 MHz, Cyclone III FPGA o 5% increase in area of emulation device (basic) 7/16/2015 21 Test results – 512 B I$ and D$ WCET multi-core WCET single-core Instrumentation overhead Cost of porting from singlecore to multicore Multi-path applications from Mälardalen Benchmark suit 7/16/2015 22 Test results – 4 KB I$ and D$ avg(WCETni/WCETnis) reduces as cache size increases due to the less number of cache misses 7/16/2015 23 Demo: www6.in.tum.de/Main/Shah 7/16/2015 24 Future extension Partitioned L2 cache o Observe dedicated partition as well as the shared partition o May not be considered a COTS design 7/16/2015 25 Conclusion A novel technique to measure WCET of applications on multi-core architectures Existing single-core analysis tools Unmodified production-chips o Preserves cost and performance benefits o Trivial addition to emulation chips is required Incremental certification Thank you Questions? 7/16/2015 26