Transcript Slide 1
The Priority Division Arbiter for low WCET and high Resource Utilization in Multi-core Architectures Hardik Shah, Kai Huang and Alois Knoll Department of Informatics – VI Technische Universität München Shared resource arbiters Shared resources to reduce cost Conflict on the shared memory Latency bounds on shared memory accesses 7/16/2015 2 Traditional arbiters Statically scheduled o No interference, low efficiency Dynamically scheduled Resource utilization o High interference, high efficiency 7/16/2015 3 Goal Priority Division arbiter [16]: hybrid, efficient, TDMA equal worst case latencies o Study effects of an arbitration scheme on applications’ WCET and resource utilization o Compare the SP, TDMA, RR and PD arbiters Exclusive focus on the shared memory interference o Cache replacement policy, branch predictors etc. are assumed with constant effect • Well addressed single-core problems 7/16/2015 4 Agenda Related work Background Priority Division Arbiter Comparison of arbiters Conclusion 7/16/2015 6 Related work - I Special arbiters: o Derived from traditional arbiters • dTDMA [14], IABA [11], Slot reservation [13], Priority Division [16], • On corner cases, all behave as the parent arbiter o Randomized arbiters Probabilistic analysis • Lottery arbiter [10, 7], RT_Lottery [4] o Budget based arbiters Our previous work Date ‘12, ‘13 • CCSP [2], PBS [20], MBBA [3], Deficit round robin [19] 7/16/2015 7 Related work - II Arbiter comparisons o [12 – Pitter et al] and [9 – Kopetz et al] found TDMA to be the most predictable arbitration scheme o [8 – Kelter et al] static WCET analysis approach for comparing SP, RR, TDMA and PD 7/16/2015 8 Background: latency and utilization Cache-line fill using non-preemptive burst access Access latency: o Employed arbitration scheme, instantaneous activity of co-existing masters and memory responsiveness Utilization: o Ability of an arbiter to utilize the shared resource when only the “test-master” is active • The efficiency of an arbiter in low load conditions 7/16/2015 9 Background: latency and utilization Utilization: o Ability of an arbiter to utilize the shared resource when only the “test-master” is active • The efficiency of an arbiter in low load conditions • The right side of ‘|’ is only valid if the left side is true 7/16/2015 10 Background: Computation trace Cache miss as a timeless event on an exe path o A cache miss latency delays subsequent cache misses Can be used to calculate the BCET or the WCET 7/16/2015 11 Background: Static (fixed) priority arbiter Each master is granted an access to the shared memory according to its priority o Higher priority master cannot preempt ongoing lower priority burst access o WLsp= 2 x SS, BLsp= SS for the highest priority master o WL = ∞ for lower priority masters Work conserving – Usp = 100% 7/16/2015 12 Background: TDMA arbiter - I Each master is granted a fixed exclusive window to access the shared memory (no interference) No effect of co-existing applications Poor resource utilization 7/16/2015 13 Background: TDMA arbiter - II Latency: N = Total number of masters Utilization: Rotation of the wheel is fixed o Latency and utilization depend on time between two cache misses, “computation time”- ci 7/16/2015 14 Background: Round robin arbiter As soon as an active master is encountered, its slot is started – “greedy TDMA” WLrr = N x SS, BLrr = SS, Urr = 100% 7/16/2015 15 Priority Division arbiter - I Mix of TDMA and SP Fixed slots, priorities inside slots o Starvation free if every master has at least one slot where it has the highest priority 7/16/2015 16 Priority Division arbiter - II Latency: Utilization: 7/16/2015 17 Priority Division arbiter - Benefits Equal WCET as TDMA @ higher resource utilization Simple architecture o Additional complexity compared to the complete system is negligible Incremental certification o Using stress patterns on co-existing cores (m2 – m4) 7/16/2015 18 Priority Division arbiter – h1 configuration Only one HRT master has the highest priority in all slots (mixed critical system with single HRT) Latency: WLSP = 2 x SS Produces lower WCET bound for the highest priority master than the SP @ utilization penalty 7/16/2015 19 Arbiter comparison: Complexity Arbiter SP TDMA RR PD Number of LEs 281 277 288 285 Dynamically scheduled arbiters (SP, RR and PD) are slightly more complex than the statically scheduled arbiter (TDMA) @125 MHz, Cyclone III FPGA, for a 4 port arbiter 7/16/2015 20 Arbiter comparison: Test architecture Quad-core processor built using NIOS II F cores with 512 Bytes I$ and D$ On-chip memory as a shared main memory Test applications from the Mälerdalen WCET benchmark suit o Recorded traces were extracted by probing the test master’s (m1’s) interface with the arbiter o Utilization was measured by observing busy and idle cycles keeping co-existing masters (m2 – m4) off 7/16/2015 21 Arbiter comparison: TDMA vs RR vs PD 7/16/2015 22 Arbiter comparison: TDMA vs RR vs PD Advantage of PD over TDMA and RR 7/16/2015 Drawback of PD over RR 23 Arbiter comparison: SP vs PDh1 Advantage of PDh1 over SP 7/16/2015 Drawback of PDh1 over SP 24 Conclusion Priority Division is a promising arbitration scheme for predictable and high performance multi-core architectures Enables incremental certification and increases resource utilization at minor increase in complexity compared to TDMA and SP (in h1 mode) PDh1 produces lower WCET than SP for the highest priority master Thank you Questions? 7/16/2015 25