Embedded Code Generation from High

Download Report

Transcript Embedded Code Generation from High

I NV E N TIV E
CONFIDENTIAL
Model-Based Design: an instance
Stavros Tripakis
Cadence Berkeley Labs
Talk at EE249, Nov 2006
Model based design: what and why?
Application
design
Stateflow
UML
Simulink
…
implementation
single-processor
single-task
multi-processor
TTA
single-processor
multi-task
Execution platform
2
CAN
…
Model based design: benefits and challenges
• Benefits:
–
–
–
–
Increase level of abstraction => ease of design
Abstract from implementation details => platform-independence
Earlier verification => bugs cheaper to fix
Design space exploration (at the “algorithmic” level)
– Consistent with history (e.g., of programming languages)
• Challenges:
– High-level languages include powerful features, e.g.,
• Concurrency, synchronous (“0-time”)
computation/communication,…
– How to implement these features?
• Do we even have to?
3
Model based design – the Verimag approach
(joint work with P. Caspi, C. Sofronis and A. Curic)
Application
design
Stateflow
UML
Simulink
…
[EMSOFT’04]
[EMSOFT’03]
validation
verification
Lustre
[classic]
implementation
[LCTES’03]
single-processor
single-task
multi-processor
TTA
[ECRTS’04,EMSOFT’05,’06]
single-processor
multi-task
Execution platform
4
CAN
…
Lego robots - movie
5
Agenda (flexible)
• Part I – from synchronous models to implementations
–
–
–
–
–
Lustre and synchronous programming
Single-processor/single-task code generation
Multi-task code generation: the RTW solution
Multi-task code generation: a general solution
Implementation on a distributed platform: TTA (not today)
• Part II – handling Simulink/Stateflow
– Simulink: type/clock inference and translation to Lustre
– Stateflow: static checks and translation to Lustre
6
Synchronous programming
• A French specialty, it seems…
– Esterel [Berry, circa 1985]
– Lustre [Caspi, Halbwachs, circa 1987]
– Signal [Le Guernic et al, circa 1991]
• Lots of mythology…
• The simple truth:
– Assume that the program is fast enough to keep up with
changes in the environment (the “synchrony hypothesis”)
– Sometimes called “zero-time hypothesis”
– Not different than the model of a Mealy machine, Verilog, etc!
7
Lustre
• “functional”, “dataflow” style
• Basic entities are “flows”: infinite sequences of values
• Time is discrete and “logical”:
– 1, 2, 3, … does not mean 1ms, 2ms, 3ms, …
– In fact time instants can very well be events
• Flows have associated “clocks”:
– The clock tells whether a flow value is defined or not at the current
instant
– Can “sample” or “hold” flows
– Synchronous: cannot combine flows with different clocks (why?)
• I can do that in Kahn process networks
• See tutorial (in English!):
– http://www-verimag.imag.fr/~halbwach/lustre-tutorial.html
• See Pascal Raymond’s course slides (in French!):
– http://www-verimag.imag.fr/~raymond/edu/lustre.pdf
8
Code generation: single-processor, single-task
• Code that implements a state machine:
inputs
step function
(transition)
memory
(state)
outputs
initialize;
repeat forever
await trigger
read inputs;
compute next state
and outputs;
write outputs;
update state;
end repeat;
• See Pascal Raymond’s course slides (in French!):
– http://www-verimag.imag.fr/~raymond/edu/compil-lustre.pdf
9
Single-processor, single-tasking (1)
•
•
One computer, no RTOS (or minimal), one process running
Process has the following structure:
initialize state;
repeat forever
await trigger;
read inputs;
compute new state and outputs;
update state;
write outputs;
end repeat;
•
•
•
A
B
a
:= A(inputs);
c
:= C(inputs);
out := B(a, c);
Trigger may be periodic or event-based
Compute = “fire” all blocks in order (no cycles are allowed)
Some major issues:
– Estimate WCET (worst-case execution time)
• “Hot” research topic, some companies also (e.g., AbsInt, Rapita, …)
– Check that WCET <= trigger period (or minimum inter-arrival time)
10
C
Single-processor, single-tasking (2)
•
•
One computer, no RTOS (or minimal), one process running
Process has the following structure:
initialize state;
repeat forever
await trigger;
write (previous) outputs;
/* reduce jitter */
read inputs;
compute new state and outputs;
update state;
end repeat;
•
Other major issues:
– Moving from floating-point to fixed-point arithmetic
– Evaluating the effect of jitter in outputs
– Program size vs. memory
•
Yet another issue:
– Causality: how to handle dependency cycles
– Different approaches
– Unclear how important in practice
11
Code generation: single-processor, multi-task
• Multiple processes (tasks) running on the same
computer
• Real-time operating system (RTOS) handles scheduling:
– Usually fix-priority scheduling:
• Each task has a fixed priority, higher-priority tasks preempt lowerpriority tasks
– Sometimes other scheduling policies
• E.g., EDF = earliest deadline first
• Questions:
– Why bother with single-processor, multi-tasking?
– What are the challenges?
12
Single-processor, multi-tasking: why bother?
• Why bother?
– For multi-rate applications: blocks running at different rates (triggers)
– Example: block A runs at 10 ms, block B runs at 40 ms
B
Ideally
Single-tasking
B
A
A
A
?
B A
A
A
A
A
A
A
B
A
B is preempted
Multi-tasking
A
B…
A
B
A
B is preempted
A
A
A
B…
WHAT IF TASKS COMMUNICATE?
13
?
A
B
Single-processor, multi-tasking issues
• Fast-to-slow transition (high-to-low priority) problems:
1 register
What would be the standard solution to this?
* Figures are cut-and-pasted from RTW User’s Guide
14
Single-processor, multi-tasking issues
• Fast-to-slow transition (high-to-low priority) problems:
2 registers
• RTW solution:
– RT block
– High priority
– Low rate
15
Bottom-line: reader copies value locally when it starts
Does it work for more general arrival patterns?
• No:
– must know when to execute RT block
– Depends on relative periods of writer/reader clocks
• More serious problems:
– See examples later in this talk
• Inefficiency:
– Copying large data can take time…
16
State of the art: Real Time Workshop (RTW)
• Simulink/Stateflow’s code generator
• “deterministic” option in rate transition blocks
• Limited solution:
– Only in the multi-periodic harmonic case (periods are multiples)
– Rate-monotonic priorities (faster task gets higher priority)
• Not memory efficient:
– Separate buffers for each writer/reader pair
• Other related work
– Baleani et al. @ PARADES : upper/lower bounds on #buffers,
applicable for general architectures
17
A better, general solution [ECRTS’04, EMSOFT’05, ’06]
• The Dynamic Buffering Protocol (DBP)
– Synchronous semantics preservation
– Applicable to any arrival pattern
• Known or unknown
• Time- or event- triggered
– Memory optimal in all cases
– Known worst case buffer requirements (for static allocation)
• Starting point: abstract synchronous model
–
–
–
–
18
Set of tasks
Independently triggered
Communicating
Synchronous (“zero-time”) semantics
The model:
an abstraction of Simulink, Lustre, etc.
• A set of communicating tasks
• Time- or event-triggered
T1
T2
T5
T3
19
T4
The model: semantics
• Zero-time => “freshest” value
T1
T2
T5
T3
T1
T3 T1
T2
T3 T4
time
20
T4
Execution on a real platform
•
•
Execution takes time
Pre-emption occurs
T1
T2
T5
T3
T1
T3 T1
T2
T3 T4
time
T1 pre-empts T3
21
T4
Assumption: schedulability
•
When a task arrives, all previous instances have finished execution.
T1
T1
Not schedulable
•
•
22
How to check schedulability? Use scheduling theory!
(will have to make assumptions on task arrivals)
time
Issues with a “naïve” implementation (1)
•
Static-priority, T2 > T1
T1 T1
T2
T1 T1
T2
T1
T2
Ideal:
Real:
T1 is pre-empted.
T2 gets the wrong value.
(*) “naïve” = atomic copy locally when task starts
23
Issues with a “naïve” implementation (1)
•
Static-priority, T2 > T1
T1 T1
T2
T1
pre
T2
Ideal:
24
•
Assumption: if reader has higher priority than writer, then there is a unitdelay (“pre”) between them.
•
(RTW makes the same assumption)
Issues with a “naïve” implementation (2)
Q
ideal semantics
A
B
Q
A
25
A
B
Issues with a “naïve” implementation (2)
Q
real implementation
A
PrioQ > PrioA > PrioB
B
Q
A
A
B
A
Q
A
ERROR
26
B
The DBP protocols
• Basic principle:
– “Memorize” (implicitly) the arrival order of tasks
• Special case: one writer/one reader
• Generalizable to one writer/many readers (same data)
• Generalizable to general task graphs
27
One writer/one reader (1)
•
Low-to-high case:
L
pre
– L keeps a double buffer B[0,1 ]
– Two bits: current, previous
– L writes to:
B[current ]
– H reads from:
B[previous ]
– When L arrives:
current := not current
– When H arrives: previous := not current
– Initially: current = 0, B[0 ]= B[1 ]= default
28
H
One writer/one reader (2)
•
High-to-low case:
H
– L keeps a double buffer B[0,1 ]
– Two bits: current, next
– H writes to:
B[next ]
– L reads from:
B[current ]
– When L arrives: current := next
– When H arrives: if (current = next) then
next := not next
– Initially: current=next=0, B[0 ]= B[1 ]= default
29
L
Dynamic Buffering Protocol (DBP)
• N1 lower priority readers
• N2 lower priority readers with unit-delay
• M higher priority readers (with unit-delay by default)
• unit-delay a delay to preserve the semantics
– Read the previous input
30
The DBP protocol (1)
• Writer maintains:
–
–
–
–
Buffer array:
Pointer array:
Pointer array:
Two pointers:
B[1..N+2]
P[1..M]
R[1..N]
current, previous
• Writer
– Release:
previous := current
current := some j[1..N+2] such that free(j)
– Execution:
write on
31
B[current]
The DBP protocol (2)
• Lower-priority reader
– Release
if unit-delay R[i] := previous
else
R[i] := current
– Execution:
read from
B[R[i]]
• Higher-priority reader
– Release
P[i] := previous
– Execution
read from B[P[i]]
32
Example of usage of DBP
w
33
low
y0
y1
prev
curr
Example of usage of DBP
w
low
w
y0
y1
prev
34
y2
curr
Example of usage of DBP
hi
w
low
w
y3
curr
35
w
y1
y2
prev
Savings in memory
• One writer  one reader : 14 buffers
• DBP
– 1 2 buffers
– 3 4 buffers
– 4 2 buffers
• Total: 8 buffers
36
Worst case buffer consumption
• DBP never uses more than N1+N2+2 buffers
– N1 lower priority readers
– N2 lower priority readers with a unit-delay
– M higher priority readers
• If N2 = M = 0 then upper bound N1+1
– There is no previous to remember
37
Optimality
• DBP is memory optimal in any arrival execution
• Let  be some execution
– Maybeneeded(,t)
• Used now
• May be used until next execution of the writer
– DBP_used(,t)
• buffers used by the DBP protocol
• Theorem: for all , t
DBP_used(,t)  maybeneeded(,t)
38
Optimality for known arrival pattern
• DBP is non-clairvoyant
– Does not know future arrivals of tasks
– => it may keep info for a reader that will not arrive until the next
execution of the writer: redundant
• How to make DBP optimal when task arrivals are known? (ex. Multiperiodic tasks)
• Two solutions:
– Dynamic: for every writer, store output only if it will be needed (known
since, readers’ arrivals are known)
– Static: Simulate arrivals tasks until hyper-period (if possible)
• Standard time vs. memory trade-off
39
Conclusions and perspectives (part I)
• Dynamic Buffering Protocol
– Synchronous semantics preservation
– Applicable to any arrival pattern
• Known or unknown
• Time or event triggered
– Memory optimal in all cases
– Known worst case buffer requirements (for static allocation)
• Relax schedulability assumption
• More platforms (in the model based approach)
– CAN, Flexray, …
• Implement the protocols and experiment
• BIG QUESTION: how much does all this matter for control???
40
Agenda (flexible)
• Part I – from synchronous models to implementations
–
–
–
–
–
Lustre and synchronous programming
Single-processor/single-task code generation
Multi-task code generation: the RTW solution
Multi-task code generation: a general solution
Implementation on a distributed platform: TTA (not today)
• Part II – handling Simulink/Stateflow
– Simulink: type/clock inference and translation to Lustre
– Stateflow: static checks and translation to Lustre
41
Simulink™
42
Simulink™
• Designed as a simulation tool, not a programming
language
• No formal semantics
– Depend on simulation parameters
– No timing modularity
– Typing depends on simulation parameters
We translate only discrete-time Simulink
(with no causality cycles)
43
From Simulink/Stateflow to Lustre
• Main issues:
– Understand/formalize Simulink/Stateflow
– Solve specific technical problems
• Some are Lustre-specific, many are not
– Implement
• Keep up with The Mathworks’ changes
44
A strange Simulink behavior
Sampled
at 2 ms
Sampled
at 5 ms
With Gain: model rejected by Simulink
Without Gain: model accepted!
45
Translating Simulink to Lustre
• 3 steps:
– Type inference:
• Find whether signal x is “real” or “integer” or “boolean”
– Clock inference:
• Find whether x is periodic (and its period/phase) or triggered/enabled
– Block-by-block, bottom-up translation:
• Translate basic blocks (adder, unit delay, transfer function, etc) as
predefined Lustre nodes
• Translate meta-blocks (subsystems) hierarchically
46
Simulink type system
•
Polymorphic types
–
–
•
•
47
“parametric” polymorphism (e.g., “Unit Delay” block)
“ad-hoc” polymorphism (e.g., “Adder” block)
Basic block type signatures:
Constant
,   {double, single, int32, int16, …}
Adder
  …    ,   {double, …}
Relation
    boolean,   {double, …}
Logical Operator
boolean  …  boolean  boolean
Disc. Transfer Function
double  double
Unit Delay
 
Data Type Converter
 
Type-inference algorithm: unification [Milner]
Time in Simulink
• Simulink has two timing mechanisms:
– sample times : (period,phase)
• Can be set in blocks: in-ports, UD, ZOH, DTF, …
• Defines when output of block is updated.
• Can be inherited from inputs or parent system.
– triggers (or “enables”) :
• Set in subsystems
• Defines when subsystem is “active” (outputs updated).
• The sample times of all children blocks are inherited.
x
y
z
48
A
s
B
trigger
w
Simulink triggers
=
Lustre clocks
Sample times in Simulink
• Greatest-common divisor (GCD) rule :
– A block fed with inputs with different rates:
x
2 ms
y
z
1 ms
3 ms
• Other timing rules, e.g.:
– Insert a unit delay when passing from a “slow” block to a “fast”
block.
49
Formalization
•
50
Sample time signatures of basic blocks:
Sample time inference algorithm
•
Sample times = types = terms:
–
–
–
–
•
Terms simplify to a canonical form
–
•
GCD(, (2,0), (3,0), )  GCD((1,0), , )
Term unification, e.g. :
–
–
–
–
51
 (unknown)
(1, 0)
(2, 1)
GCD( t1, t2 )
From the equations: z = GCD(x,y) and x = z
We get: x = GCD(x, y)
Thus: x = GCD(y)
Thus: x = y = z
Overview of clock inference algorithm
•
Infer the sample time of every Simulink signal.
•
Check Simulink’s timing rules.
•
Create Lustre clocks for Simulink sample times and triggers.
–
–
–
52
Basic clock: GCD of all sample times, e.g., 1ms.
Other clocks: multiples of basic clock, e.g.
true false true false L = 2ms.
Stateflow
• Main problem: “unsafe” features
–
–
–
–
–
Non-termination of simulation cycle
Stack overflow
Backtracking without “undo”
Semantics depends on graphical layout
Other problems:
• “early return logic”: returning to an invalid state
• Inter-level transitions
• …
53
Stateflow problems:
non-terminating loops
• Junction networks:
54
Stateflow problems:
stack overflow
• When event is broadcast:
– Recursion and run-to-completion
• Stack overflow:
55
Stateflow problems:
backtracking without “undo”
56
Stateflow problems:
semantics depends on layout
• “top-to-bottom, left-to-right” rule for states:
• “12 o’clock” rule for transitions
57
Stateflow problems:
“early return logic”
• Return to a non-active state:
58
A “safe” subset of Stateflow
• Safe = terminating, bounded-memory, “clean”
• Problem undecidable in general
• Different levels of “safeness”:
– Static checks (cheap but strict)
– Dynamic verification (heavy but less strict)
59
A statically safe subset of Stateflow
• Static checks include:
–
–
–
–
–
60
Absence of multi-segment loops
Acyclicity of triggering/emitted events
No assignments in intermediate segments
Outgoing junction conditions form a cover (implies no deadlocks)
Outgoing junction conditions are disjoint (implies determinism)
From Stateflow to Lustre
• Main difficulty:
– Translating state-machines into dataflow
• Approach:
– Encode states with Boolean variables
– Encode execution order by “dummy” dependencies
61
Clock Inference
x
1
y
2
Zero-order hold
x
2
y
3
62
cl_1_2 = make_cl_1_2();
y = x when cl_1_2;
cl_1_2 = {true, false, true, false…}
A
z
1
xc = current(x);
yc = current(y);
z = A(xc, yc);
Translation to Lustre
• Encoding of states and events as boolean flows
• “mono-clock”
Off
Set
Reset
63
On
node SetReset0(Set, Reset: bool)
returns (sOff, sOn: bool);
let
sOff = true ->
if pre sOff and Set then false
else if (pre sOn and Reset) then true
else pre sOff;
sOn = false ->
if pre sOn and Reset then false
else if (pre sOff and Set) then true
else pre sOn;
tel
I NV E N TIV E
CONFIDENTIAL
End…
“hi2low” protocol demonstration
Q
A
B
Q
A
A
B
A
y1
next
65
PrioQ > PrioA > PrioB
“hi2low” protocol demonstration
Q
A
B
Q
A
A
B
Q
A
y1
66
next
current
PrioQ > PrioA > PrioB
“hi2low” protocol demonstration
Q
A
B
PrioQ > PrioA > PrioB
Q
A
A
B
Q
A
y1
current
67
y2
next
A
“hi2low” protocol demonstration
Q
A
B
PrioQ > PrioA > PrioB
Q
A
A
B
Q
A
y1
current
68
y2
next
A
B
Execution model
69
Example of usage
71
Readings
• Overall approach – LCTES’03 paper:
– http://www-verimag.imag.fr/~tripakis/papers/lctes03.ps
• Simulink to Lustre - ACM TECS’05 paper:
– http://www-verimag.imag.fr/~tripakis/papers/acm-tecs.pdf
• Stateflow to Lustre – EMSOFT’04 paper
– http://www-verimag.imag.fr/~tripakis/papers/emsoft04.pdf
• Multi-task implementations – ECRTS’04, EMSOFT’05,’06 papers:
– http://www-verimag.imag.fr/TR/TR-2004-12.pdf
– http://www-verimag.imag.fr/~tripakis/papers/emsoft05.pdf
– http://www-verimag.imag.fr/~tripakis/papers/emsoft06.pdf
• A tutorial chapter on synchronous programming:
– http://www-verimag.imag.fr/~tripakis/papers/handbook07.pdf
72