Transcript Slide 1

The Priority Division Arbiter for low WCET and
high Resource Utilization in Multi-core
Architectures
Hardik Shah, Kai Huang and Alois Knoll
Department of Informatics – VI
Technische Universität München
Shared resource arbiters
 Shared resources to reduce cost
 Conflict on the shared memory
 Latency bounds on shared memory accesses
7/16/2015
2
Traditional arbiters
 Statically scheduled
o No interference, low efficiency
 Dynamically scheduled
Resource
utilization
o High interference, high efficiency
7/16/2015
3
Goal
 Priority Division arbiter [16]: hybrid, efficient,
TDMA equal worst case latencies
o Study effects of an arbitration scheme on applications’
WCET and resource utilization
o Compare the SP, TDMA, RR and PD arbiters
 Exclusive focus on the shared memory
interference
o Cache replacement policy, branch predictors etc. are
assumed with constant effect
• Well addressed single-core problems
7/16/2015
4
Agenda





Related work
Background
Priority Division Arbiter
Comparison of arbiters
Conclusion
7/16/2015
6
Related work - I
 Special arbiters:
o Derived from traditional arbiters
• dTDMA [14], IABA [11], Slot reservation [13], Priority
Division [16],
• On corner cases, all behave as the parent arbiter
o Randomized arbiters
Probabilistic analysis
• Lottery arbiter [10, 7], RT_Lottery [4]
o Budget based arbiters
Our previous work Date ‘12, ‘13
• CCSP [2], PBS [20], MBBA [3], Deficit round robin [19]
7/16/2015
7
Related work - II
 Arbiter comparisons
o [12 – Pitter et al] and [9 – Kopetz et al] found TDMA to
be the most predictable arbitration scheme
o [8 – Kelter et al] static WCET analysis approach for
comparing SP, RR, TDMA and PD
7/16/2015
8
Background: latency and utilization
 Cache-line fill using non-preemptive burst access
 Access latency:
o Employed arbitration scheme, instantaneous activity
of co-existing masters and memory responsiveness
 Utilization:
o Ability of an arbiter to utilize the shared resource
when only the “test-master” is active
• The efficiency of an arbiter in low load conditions
7/16/2015
9
Background: latency and utilization
 Utilization:
o Ability of an arbiter to utilize the shared resource
when only the “test-master” is active
• The efficiency of an arbiter in low load conditions
• The right side of ‘|’ is only valid if the left side is true
7/16/2015
10
Background: Computation trace
 Cache miss as a timeless event on an exe path
o A cache miss latency delays subsequent cache misses
 Can be used to calculate the BCET or the WCET
7/16/2015
11
Background: Static (fixed) priority arbiter
 Each master is granted an access to the shared
memory according to its priority
o Higher priority master cannot preempt ongoing lower
priority burst access
o WLsp= 2 x SS, BLsp= SS for the highest priority master
o WL = ∞ for lower priority masters
 Work conserving – Usp = 100%
7/16/2015
12
Background: TDMA arbiter - I
 Each master is granted a fixed exclusive window
to access the shared memory (no interference)
 No effect of co-existing applications
 Poor resource utilization
7/16/2015
13
Background: TDMA arbiter - II
 Latency:
N = Total number
of masters
 Utilization:
 Rotation of the wheel is fixed
o Latency and utilization depend on time between two
cache misses, “computation time”- ci
7/16/2015
14
Background: Round robin arbiter
 As soon as an active master is encountered, its
slot is started – “greedy TDMA”
 WLrr = N x SS, BLrr = SS, Urr = 100%
7/16/2015
15
Priority Division arbiter - I
 Mix of TDMA and SP
 Fixed slots, priorities inside slots
o Starvation free if every master has at least one slot
where it has the highest priority
7/16/2015
16
Priority Division arbiter - II
 Latency:
 Utilization:
7/16/2015
17
Priority Division arbiter - Benefits
 Equal WCET as TDMA @ higher resource utilization
 Simple architecture
o Additional complexity compared to the complete
system is negligible
 Incremental certification
o Using stress patterns on co-existing cores (m2 – m4)
7/16/2015
18
Priority Division arbiter – h1 configuration
 Only one HRT master has the highest priority in all
slots (mixed critical system with single HRT)
 Latency:
WLSP = 2 x SS
 Produces lower WCET bound for the highest
priority master than the SP @ utilization penalty
7/16/2015
19
Arbiter comparison: Complexity
Arbiter
SP
TDMA
RR
PD
Number of
LEs
281
277
288
285
 Dynamically scheduled arbiters (SP, RR and PD) are
slightly more complex than the statically scheduled
arbiter (TDMA)
 @125 MHz, Cyclone III FPGA, for a 4 port arbiter
7/16/2015
20
Arbiter comparison: Test architecture
 Quad-core processor built using NIOS II F cores
with 512 Bytes I$ and D$
 On-chip memory as a shared main memory
 Test applications from the Mälerdalen WCET
benchmark suit
o Recorded traces were extracted by probing the test
master’s (m1’s) interface with the arbiter
o Utilization was measured by observing busy and idle
cycles keeping co-existing masters (m2 – m4) off
7/16/2015
21
Arbiter comparison: TDMA vs RR vs PD
7/16/2015
22
Arbiter comparison: TDMA vs RR vs PD
Advantage of
PD over TDMA
and RR
7/16/2015
Drawback
of PD over
RR
23
Arbiter comparison: SP vs PDh1
Advantage of
PDh1 over SP
7/16/2015
Drawback
of PDh1 over
SP
24
Conclusion
 Priority Division is a promising arbitration scheme
for predictable and high performance multi-core
architectures
 Enables incremental certification and increases
resource utilization at minor increase in complexity
compared to TDMA and SP (in h1 mode)
 PDh1 produces lower WCET than SP for the highest
priority master
Thank you Questions?
7/16/2015
25