Transcript Slide 1
The Priority Division Arbiter for low WCET and
high Resource Utilization in Multi-core
Architectures
Hardik Shah, Kai Huang and Alois Knoll
Department of Informatics – VI
Technische Universität München
Shared resource arbiters
Shared resources to reduce cost
Conflict on the shared memory
Latency bounds on shared memory accesses
7/16/2015
2
Traditional arbiters
Statically scheduled
o No interference, low efficiency
Dynamically scheduled
Resource
utilization
o High interference, high efficiency
7/16/2015
3
Goal
Priority Division arbiter [16]: hybrid, efficient,
TDMA equal worst case latencies
o Study effects of an arbitration scheme on applications’
WCET and resource utilization
o Compare the SP, TDMA, RR and PD arbiters
Exclusive focus on the shared memory
interference
o Cache replacement policy, branch predictors etc. are
assumed with constant effect
• Well addressed single-core problems
7/16/2015
4
Agenda
Related work
Background
Priority Division Arbiter
Comparison of arbiters
Conclusion
7/16/2015
6
Related work - I
Special arbiters:
o Derived from traditional arbiters
• dTDMA [14], IABA [11], Slot reservation [13], Priority
Division [16],
• On corner cases, all behave as the parent arbiter
o Randomized arbiters
Probabilistic analysis
• Lottery arbiter [10, 7], RT_Lottery [4]
o Budget based arbiters
Our previous work Date ‘12, ‘13
• CCSP [2], PBS [20], MBBA [3], Deficit round robin [19]
7/16/2015
7
Related work - II
Arbiter comparisons
o [12 – Pitter et al] and [9 – Kopetz et al] found TDMA to
be the most predictable arbitration scheme
o [8 – Kelter et al] static WCET analysis approach for
comparing SP, RR, TDMA and PD
7/16/2015
8
Background: latency and utilization
Cache-line fill using non-preemptive burst access
Access latency:
o Employed arbitration scheme, instantaneous activity
of co-existing masters and memory responsiveness
Utilization:
o Ability of an arbiter to utilize the shared resource
when only the “test-master” is active
• The efficiency of an arbiter in low load conditions
7/16/2015
9
Background: latency and utilization
Utilization:
o Ability of an arbiter to utilize the shared resource
when only the “test-master” is active
• The efficiency of an arbiter in low load conditions
• The right side of ‘|’ is only valid if the left side is true
7/16/2015
10
Background: Computation trace
Cache miss as a timeless event on an exe path
o A cache miss latency delays subsequent cache misses
Can be used to calculate the BCET or the WCET
7/16/2015
11
Background: Static (fixed) priority arbiter
Each master is granted an access to the shared
memory according to its priority
o Higher priority master cannot preempt ongoing lower
priority burst access
o WLsp= 2 x SS, BLsp= SS for the highest priority master
o WL = ∞ for lower priority masters
Work conserving – Usp = 100%
7/16/2015
12
Background: TDMA arbiter - I
Each master is granted a fixed exclusive window
to access the shared memory (no interference)
No effect of co-existing applications
Poor resource utilization
7/16/2015
13
Background: TDMA arbiter - II
Latency:
N = Total number
of masters
Utilization:
Rotation of the wheel is fixed
o Latency and utilization depend on time between two
cache misses, “computation time”- ci
7/16/2015
14
Background: Round robin arbiter
As soon as an active master is encountered, its
slot is started – “greedy TDMA”
WLrr = N x SS, BLrr = SS, Urr = 100%
7/16/2015
15
Priority Division arbiter - I
Mix of TDMA and SP
Fixed slots, priorities inside slots
o Starvation free if every master has at least one slot
where it has the highest priority
7/16/2015
16
Priority Division arbiter - II
Latency:
Utilization:
7/16/2015
17
Priority Division arbiter - Benefits
Equal WCET as TDMA @ higher resource utilization
Simple architecture
o Additional complexity compared to the complete
system is negligible
Incremental certification
o Using stress patterns on co-existing cores (m2 – m4)
7/16/2015
18
Priority Division arbiter – h1 configuration
Only one HRT master has the highest priority in all
slots (mixed critical system with single HRT)
Latency:
WLSP = 2 x SS
Produces lower WCET bound for the highest
priority master than the SP @ utilization penalty
7/16/2015
19
Arbiter comparison: Complexity
Arbiter
SP
TDMA
RR
PD
Number of
LEs
281
277
288
285
Dynamically scheduled arbiters (SP, RR and PD) are
slightly more complex than the statically scheduled
arbiter (TDMA)
@125 MHz, Cyclone III FPGA, for a 4 port arbiter
7/16/2015
20
Arbiter comparison: Test architecture
Quad-core processor built using NIOS II F cores
with 512 Bytes I$ and D$
On-chip memory as a shared main memory
Test applications from the Mälerdalen WCET
benchmark suit
o Recorded traces were extracted by probing the test
master’s (m1’s) interface with the arbiter
o Utilization was measured by observing busy and idle
cycles keeping co-existing masters (m2 – m4) off
7/16/2015
21
Arbiter comparison: TDMA vs RR vs PD
7/16/2015
22
Arbiter comparison: TDMA vs RR vs PD
Advantage of
PD over TDMA
and RR
7/16/2015
Drawback
of PD over
RR
23
Arbiter comparison: SP vs PDh1
Advantage of
PDh1 over SP
7/16/2015
Drawback
of PDh1 over
SP
24
Conclusion
Priority Division is a promising arbitration scheme
for predictable and high performance multi-core
architectures
Enables incremental certification and increases
resource utilization at minor increase in complexity
compared to TDMA and SP (in h1 mode)
PDh1 produces lower WCET than SP for the highest
priority master
Thank you Questions?
7/16/2015
25