Transcript Slide 1

Predictable Programming on a Precision
Timed Architecture
Hiren D. Patel
UC Berkeley
[email protected]
Joint work with:
Ben Lickly, Isaac Liu, Edward A. Lee - UC Berkeley
Sungjun Kim, Stephen A. Edwards - Columbia University
Edwards and Lee - Case for PRET
• 2007 – Edwards and Lee made a case for
precision timed computers (PRET machines)
– Predictability
– Repeatability
S. A. Edwards and E. A. Lee, The case for the precision timed (PRET) machine. In Proceedings of
the 44th Annual Conference on Design Automation (San Diego, California, June 04 - 08, 2007).
DAC '07. ACM, New York, NY, 264-265.
Patel, UC Berkeley, PRET
2
Edwards and Lee - Case for PRET
• Unpredictability
– Difficulty in determining timing behavior
through analysis
• Non-repeatability
– Lack of guarantee that every execution yields
the same timing behavior
• Brittleness
– Small changes have big effects on timing
behavior
Patel, UC Berkeley, PRET
3
Brittleness
• Expensive affair
• Tight coupling of software
and hardware
• Reliance on testing for
validation
Source: www.skycontrol.net
• Upgrading difficult
• Solution: stockpile
Patel, UC Berkeley, PRET
4
But wait …
• Real-time scheduling
– Worst-case execution
time
• Detailed model of
hardware
• Large engineering
effort
• Valid for particular
hardware models
– Interrupts, interprocess communication,
locks …
• Bench testing
– Brittle
Patel, UC Berkeley, PRET
Sebastian Altmeyer, Christian Hümbert, Björn Lisper, and
Reinhard Wilhelm. Parametric Timing Analysis for Complex
Architectures. In Proceedings of the 14th IEEE
International Conference on Embedded and Real-Time
Computing Systems and Applications (RTCSA'08), pages
367-376, Kaohsiung, Taiwan, August 2008. IEEE Computer
Society.
5
Precise Timing and High Performance
Traditional
Alternative
Caches
Scratchpads
Deep out-of-order pipelines
Thread-interleaved pipelines
Function-only ISAs
ISAs with timing instructions
Function-only languages
Languages and programming
models with timing
Best-effort communication
Fixed-latency communication
Time-sharing
Multiple independent processors
Patel, UC Berkeley, PRET
6
Outline
•
•
•
•
•
•
Introduction
Related Work
PRET Machine
Programming Example
Future Work
Conclusion
Patel, UC Berkeley, PRET
7
Related Work
• Java Optimized Processor
– Schoeberl et al. [2003]
• Timing instructions
– Ip and Edwards [2006]
• Reactive processors
– Von Hanxleden et al. [2005]
– Salcic et al. [2005]
• Virtual Simple Architecture
– Mueller et al. [2003]
Patel, UC Berkeley, PRET
8
Semantics of Timing Instructions
• Deadline instructions
– Denote the required
execution time of a
block
• When decoded
– Stall instruction if
timer value is not 0
– Otherwise set timer
value to new value
Patel, UC Berkeley, PRET
deadi
…
deadi
…
deadi
…
L0:
…
deadi
b L0
…
$t0, 10
Straight Line Block 0
$t0, 8
Straight Line Block 1
$t0, 0
$t0, 10
Loop
Block
9
Tracing A Program Fragment
A: deadi $t0, 6
B: sethi %hi(0x3f800000), %g1
C: or %g1, 0x200, %g1
D: st %g1, [ %fp + -12 ]
E: deadi $t0, 8
F: …
Patel, UC Berkeley, PRET
cycle
$t0
4
0
2
3
5
8
6
1
10
Precision Timed Architecture
Round-robin thread scheduling
Thread-interleaved pipeline
Patel, UC Berkeley, PRET
Scratchpad memories
Time-triggered main memory
access
11
Memory Hierarchy
• Clocks
– Main clock
– Derived clocks
• Instruction and data
scratchpad memories
– 1 cycle access latency
Core
SPM
SPM
SPM
SPM
SPM
SPM
Main
Mem.
• Main memory
– 16MB size
– Latency of 50ns
– Frequency:250Mhz
DMA
• ~13 cycles latency
Patel, UC Berkeley, PRET
12
Thread-interleaved Pipeline
• Thread stalls
– Main memory access
– Multi-cycle operations
– Deadline instructions
• Replay mechanism
– Execute same PC next
iteration
– Multi-cycle ALU ops
replay instructions
Patel, UC Berkeley, PRET
Decrement
Deadline
Timers
Fetch
F/D
Decode
Stall if
Deadline
Instruction
D/R
Reg. Access
R/E
Execute
Check main E/M
memory
access
M/W
Increment
PC
Memory
WriteBack
13
Time-Triggered Access through Memory
Wheel
• Decouple thread’s
access pattern
• Best-case access time
• Time-triggered access
• Worst-case access
time
– If accessed 1st cycle
– If accessed 2nd cycle
of window
90 cycles until thread0 completes
thread0
On time
On time
On time
thread1
thread2
thread3
Patel, UC Berkeley, PRET
On time
thread4
On time
thread5 thread0
14
Tool Flow
• GCC 3.4.4, SystemC 2.2, Python 2.4
Boot code
C programs
timing instructions
Patel, UC Berkeley, PRET
Motorola SREC files
GCC to compile boot code
and program code
15
Simple Mutual Exclusion Example
• Producer followed by Consumer and Observer
– Consumer and Observer execute together
• Loop rate of two rotations of memory wheel
– 1st for Producer to write
– 2nd Consumer and Observer to read
Write to shared data
Patel, UC Berkeley, PRET
Write to output
Read from shared data
16
Video Game Example
MainControl
Thread
Command
Even
Queue
Command
Odd
Queue
Swap (When Sync
Requested and When
Odd Queue Empty)
Graphi
c
Thread
Pixel Data
Even
Buffer
Pixel Data
Odd
Buffer
Swap (When sync
requested and when
Vertical blank)
Update Screen (Sync request)
Refresh (Sync request)
Sync (After queue swapped)
Sync (After buffer swapped)
Patel, UC Berkeley, PRET
VGADriver
Thread
17
Timing Requirements
Signal
Timing
Requirement
Pixel
Cycles
V. Sync
64µs
1611
V. Back-porch
1.02ms
25679
Draw 480 lines
15.25ms
V. Front-porch
350µs
8811
H. Sync
3.77µs
96
H. Back-porch
1.89µs
48
Draw 640 pixels
25.42µs
H. Front-porch
0.64µs
Patel, UC Berkeley, PRET
16
18
Timing Implementation
• Pixel-clock using
derived clock
– 25.175Mhz
– ~ 39.72ns cycle period
• Drawing 16 pixels
Patel, UC Berkeley, PRET
19
Future Work
• Architecture
–
–
–
–
DMA
DDR2 main memory model
Thread synchronization primitives
Shared data between threads
• Real-time Benchmarks
– With timing requirements
• Programming models
– Memory allocation schemes
– Synchronizations
Patel, UC Berkeley, PRET
20
Conclusion
• What we want …
– Time as a first class citizen of embedded
computing
– Predictability
– Repeatability
• Where we are at …
– PRET cycle-accurate simulator
– Release …
Patel, UC Berkeley, PRET
21
Patel, UC Berkeley, PRET
22
Extras
Patel, UC Berkeley, PRET
23
More on Brittleness
• Small changes may have big effects on
timing behavior
Theorem (Richard’s anomalies):
If a task set with fixed priorities, execution times, and
precedence constraints is optimally scheduled on a fixed
number of processors, then increasing the number of
processors, reducing execution times, or weakening
precedence constraints can increase the schedule length.
Richard L. Graham, “Bounds on the performance of scheduling algorithms”, in E. G. Coffman,
Jr.(ed.), Computer and Job-Shop Scheduling Theory, John Wiley, New York, 1975.
Patel, UC Berkeley, PRET
24
Richard’s Anomalies
• 9 tasks, 3 processors, priority list,
precedence order, execution times.
T1/3
T2/2
T3/2
T4/2
1
2
3
4
9
5
6
7
8
T5/4
T6/4
T7/4
T8/4
T9/9
Patel, UC Berkeley, PRET
0
3
12
25
Richard’s Anomalies: Reducing Execution
Times
• eTime’ = eTime - 1
T1/2
T2/1
T3/1
1
2
3
4
9
5
6
7
8
T5/3
T6/3
T7/3
T8/3
T9/8
Patel, UC Berkeley, PRET
T4/1
0
3
12
26
Richard’s Anomalies: More Processors
• 4 processors
T1/3
T2/2
T3/2
T4/2
1
2
3
4
9
5
6
7
8
T5/4
T6/4
T7/4
T8/4
T9/9
Patel, UC Berkeley, PRET
0
3
12
15
27
Richard’s Anomalies: Changing Priority List
• L = (T1,T2,T4,T5,T6,T3,T9,T7,T8)
T1/3
T2/2
T3/2
T4/2
1
2
6
3
7
4
3
8
9
T5/4
T6/4
T7/4
T8/4
T9/9
Patel, UC Berkeley, PRET
0
3
12
28
Brittleness Again…
• In general, all task scheduling strategies
are brittle
Patel, UC Berkeley, PRET
29