Transcript Slide 1
Memory Oriented System-level Optimizations
for Scripting Enabled Embedded Systems
Jiwon Hahn
PhD Qualifying Exam
University of California, Irvine
March 2006
Motivation
▶ Embedded system development
Growing challenges
Increasing end-user’s expectation
More functionality
Higher performance
Cheaper
Smaller
eco node
motion sensing
physiological
sensing
structural health preterm infant
monitoring
monitoring
Very short time-to-market
Wide gap between available techniques and user
satisfaction
Need new tools and methodology!
Jiwon Hahn, UC Irvine
2
Strategies
Speed up the development!
Need better programming/debugging methodology
and tool
Improve the current system’s bottleneck!
Memory unit is one of the most costly components,
and affects system’s performance, power, and
overall application range
Maximize the system’s capability!
Since embedded system is resource constrained, it
helps to partition the system workload to the host
Jiwon Hahn, UC Irvine
3
About My Research
Framework
Enhanced programming/debugging methodology
Host-assisting runtime environment
Optimization
Reducing data memory requirements and
increasing memory utilization
Power and performance co-optimization
Jiwon Hahn, UC Irvine
4
Outline
Scripting Framework
Memory-oriented Optimization
Implementation
Experimental Platforms
Summary & Research Plan
Jiwon Hahn, UC Irvine
5
Outline
▶ Scripting Framework
⊳ Scripting Engine Synthesis
⊳ Runtime Environment
⊳ Preliminary Results
Memory-oriented Optimization
Implementation
Experimental Platforms
Summary & Research Plan
Jiwon Hahn, UC Irvine
6
Motivating Example
▶ Building a small embedded system
Application
Hardware
temperature sensor
Solder RF module
sense temperature,
send to the host every 5 min.
Software (or Firmware)
no OS support!
no interactivity
no partial testing
Platform
TecO particle
17 x 35 mm
PIC18LF452 at 20 MHz
32KB program Flash
1.5KB RAM
32KB external EEPROM
temperature sensor
RF interface
Etc.
1. Write the FW (C/assembly)
2. Compile
repeat
3. Connect board to the host
4. Enter the bootloading mode
5. Erase/Load/Verify Program
6. Restart the board
7. Run
Jiwon Hahn, UC Irvine
7
Motivation
▶ Alternative approach: Scripting!
1. Generate the FW
(Scripting engine synthesis)
2. Compile
Scripting
repeat
Environment Setup
3. Connect board to the host
1. Write the script
2. Connect board to the host
3. Load & Run
4. Enter the bootloading mode
5. Erase/Load/Verify Program
6. Restart the board
7. Run
Scripting Engine Synthesis
Jiwon Hahn, UC Irvine
+
Runtime
8
Motivation
▶ Scripting vs. Traditional Programming
Aspects
Traditional
Scripting
Language
C, Assembly
less human readable
Python, Tcl, Perl, …
higher level
System Query
No interactivity
Instant feedback
need oscilloscope, multimeter
to check the status
System Update Recompile, reboot required
Code Size
5x~ 10x more lines
On-the-fly
Shorter
[J. Ousterhout ’98]
Performance
Overhead
Jiwon Hahn, UC Irvine
None
Scripting enginedependant
(could be None or less)
9
Related Work
▶ Frameworks for runtime support
Name
high level
(language)
interacti
vity
reconfigu
rability
SOS
no
Mate
(C)
no
yes*
yes
yes
20K
no
(asm-like)
no
yes*
no
no
39K
TinyOS
no
(nesC)
no
yes
yes*
no
18K
Agilla
no
(asm-like)
yes
yes*
no
no
55K
Pushpin
no
(C-subset)
no
yes*
no
(berthaOS)
no
34K
Sensorware
yes* (Tcl)
yes
yes*
no
no
>237K
Actornet
yes* (S-expression)
N/A
yes
no
no
<128K
VM*
yes (java)
no
yes*
yes
N/A
25K
Our work
yes (python-like)
yes
yes
yes
yes
<17K
Jiwon Hahn, UC Irvine
kernel
synthesis
hetero.
sys.
code size
10
Our Framework: Rappit
▶ Overview
Host
Target System
Rappit S/W
Wired/Wireless link
>> readTemperature()
52
Application
Script
Receive packets
InterpretRappit
the command
F/W
Execute primitives
(e.g., ADC
read)
Device
Drivers
Return the result
#include <stdio.h>
void main(void)
{
int a;
.
.
For(i=0;i<2;i++)
{
.
a =b * c;
}
.
.
return;
}
H/W Device
Framework to provide user an integrated scripting environment
of the host and target systems
Jiwon Hahn, UC Irvine
11
Rappit
▶ Scripting engine synthesis
System Description
Architecture
Interactive
Language
Host
Jiwon Hahn, UC Irvine
Application
Communication
// part of Scripting engine
switch (opcode)
for an RF module
{ # example: pin mapping
// part of primitives
Code # instantiate
Component
mcucase
= MCU(ATmega169)
an atmega169 MCU
0x00:
char ADC_read(void)
Synthesis# load a transceiver
Library
import val
RF = ADC_read();
module
{
rf case
= RF(nRF2401)
# instantiate nRF2401
0x01:
…
rf.CS =RF_send(val);
mcu.PORTB[0]
# connect the chip select pin
}
rf.CE
mcu.PORTB[1]
# connect the chip enable pin
case= 0x02:
rf.DR1 RF_packetize(val);
= mcu.PORTB[2]
# connect the data ready pin
void RF_send(char
pck)
Binary
rf.CLK1
= mcu.PORTF[1] # connect the clock pin
…
{
Executable
rf.DOUT1 = mcu.PORTF[2]
the data pin
Compatible # connect
}Host
Target
F/W …
S/W
Message format
(Scripting }
(Parser,
# example: packet format
Engine,
MsgGen,
c_format = src(1),dst(1),msgID(1),opcode(1),arg(3),crc(1)
r_format
=
src(1),dst(1),msgID(1),mtype(1),dtype(1),\
Target
Primitives,…)
GUI, …)
data(v), crc(1),eop(1)
System
12
Rappit
▶ Runtime environment
Host
Host Assisting modules
Jiwon Hahn, UC Irvine
Pck
Buffer
Packet
Manager
Pcktzer/
Depcktzer
Pcktzer/
Dispatcher
Component
Library
Msg
Generator
Optimizer
Parser
GUI
Target System
Admission
Controller
Scripting
Engine
Native
Routines
command
response
13
Rappit
▶ Host assistance
Script Parsing (Parser)
“readTemp()”
• User friendly
Syntax
Host Parser,
Msg. generator
To target node
“0x4A0x01”
• Easy to parse at node
• Compact and efficient
representation
Memory Management (Optimizer)
Raw script
• Written by user
Jiwon Hahn, UC Irvine
Script Scheduler,
Buffer Mapper
To target node
Optimized script
• Minimal script size
• Minimized memory usage
• Minimized runtime overhead
(Fixed schedule and buffer usage)
14
Rappit
▶ Scripting examples
Interactive port-setting
>>
>>
>>
>>
0
>>
>>
>>
1
PORTA[2]
PORTA[2]
PORTA[1]
PORTA[0]
=
=
=
#
1 # toggle clock
0
1 # set port A pin 1
read input pin
PORTA[2] = 1
PORTA[2] = 0 # toggle clock
PORTA[0] # read input pin
System configuration
>> mcu.sysclock = 1 MHz
>> uart.baudrate = 9600 bps
>> rf.power = -5 db
>> rf.speed = 1 Mbps
>> rf.config # query
{’payload’: 1, ’power’: -5,
’speed’: 1000000,
’channel’:100, ’mode’: TX’}
Periodic-task scheduling
>> s = (every 50 ms: sample())
>> s.start()
>> s.stop()
Jiwon Hahn, UC Irvine
15
Rappit
▶ Experimental platform
AVR Butterfly Board
Atmel ATmega169
8-bit MCU @ 8MHz, 512B
EEPROM, 1KB SRAM,
16KB program flash
Includes dataflash,
speaker, sensors, joystick, LCD
USART serial link at 9600 baud
AVR Butterfly
Jiwon Hahn, UC Irvine
AVR Butterfly w/ Wireless module
16
Rappit
▶ Experimenting metrics and modality
Observation Metrics
Metric
Unit
Code size
Bytes
Execution Speed
Cmds/sec
Execution Modality
Modality
Approach
Programming Method
Native
Compiled
Program the firmware onto the Flash
Batch
Scripting
Preload a script program onto the RAM
Interactive
Scripting
Send one line of command to the RAM
Jiwon Hahn, UC Irvine
17
Rappit
▶ Preliminary results
Code size reduction
61.8 – 66.3% reduction
Scripting engine consists a thin
layer
Most reduction in application
code size
Jiwon Hahn, UC Irvine
Performance overhead
Batch mode scripting can be
faster than native!
Observed up to 25.7%
speed-up
18
Outline
Scripting Framework
▶ Memory-oriented Optimization
⊳ Memory Optimization
⊳ Multi-metric Optimization
Implementation
Experimental Platforms
Summary & Research Plan
Jiwon Hahn, UC Irvine
19
Motivating Example
▶ Installing Rappit primitives on Butterfly
Problem Arise
Problem Analysis
Choose primitives
ADC_read, RF_send,
RF_read, SD_write,
SD_read, …
Compile & Install
Runtime Error!
Why?
.data
SD_buffer
512B
.bss
static unsigned char sd_buffer[512];
heap
1KB
RF_buffer
static unsigned char rf_buffer[30];
char error_msg1 = “No SD Card detected!”;
static unsigned char ADC_buffer[30];
ADC_buffer
char error_msg2 = “Card Read Error!”;
…
Static strings
…
exceeded 1KB RAM usage
Solution
Sharing memory space
Mapping static data to
dataflash
stack
SRAM
Memory Sharing
Map to dataflash
1KB
heap
Shared_buffer
600B ?
Result
Increased board capability
Increased application range
Jiwon Hahn, UC Irvine
stack
SRAM
20
Data Memory Minimization
▶ Assumptions and Approach
Assumptions
Optimizing scripts
script size buffer size
Optimizing at runtime
Need low complexity algorithm
Approach
High-level optimization
Using scheduling and buffer mapping techniques
Priority on data memory minimization
Based on model of computation (MoC)
Jiwon Hahn, UC Irvine
21
Models of Computation (MoC)
Synchronous Dataflow (SDF) [E. Lee ’87]
Extensively used as specification for blockdiagram based programming environments for
signal processing
Special case of dataflow
No notion of time
The number of tokens (=data) consumed and produced
by each actor (=node) during each firing (=invocation)
cycle is statically fixed.
Fractional Rate Dataflow (FRDF) [H. Oh, S. Ha ’02]
Extension of SDF that allows fractional flow of I/O
samples of the original SDF
Jiwon Hahn, UC Irvine
22
Why SDF?
Formal representation for optimization, simulation
and analysis
System-level optimization
Application flow of various primitives
Static scheduling
Minimize runtime overhead for resource constrained
embedded systems
Deadlock detection
Bounding the memory requirements
Good match for sensor applications
collect data, process, transmit
Jiwon Hahn, UC Irvine
23
SDF
▶ Notations
SDF graph G = (V, E, p, c)
V: {v1, v2, … v|V|}
E: {e1, e2, … e|E|}
e1
v1
1
e2
2
v2
2
e3
1
v3
3
e|E|
…
…
5
v|V|
src(e) : source node
snk(e): sink node
p(e) : produce rate src(e1) p(e1) c(e1) snk(e1)
-c(e) : consume rate
v1 v2 v3 … v|V|
T(e,v): topology matrix
p(e) if v = src(e),
-c(e) if v = snk(e)
0 otherwise
Jiwon Hahn, UC Irvine
T=
e1
e2
e3
…
e|E|
1
0
0
-2
2
0
0
0
0 … 0
-1 … 0
3 …
…
0 … -5
24
SDF
▶ Example
Surge Application
A
ADC
read
x
1
1
Actors: A, B, C
Buffers: x, y
Schedule: ABC
Rappit Script (4L):
Jiwon Hahn, UC Irvine
C
B
RF
pack
y
1
1
RF
send
every 2048:
x = ADC.read()
y = RF.pack(x)
RF.send(y)
25
SDF
▶ Example (cont’d)
Same code in Java (20L) [J. Koshy ’05]:
SurgePacket sgPkt;
char eList, eVector;
byte sHandle;
sgPkt = new SurgePacket();
evList = Select.setEventId( eList, Events.TIMEOUT | Events.RADIO RECV );
sHandle = Select.requestSelectHandle();
char val;
Clock.startTimeout( 2048 );
while (true) {
eVector = Select.select(sHandle, eList);
if (Select.eventOccurred( eVector, Events.TIMEOUT )) {
val = PhotoSensor.sense();
sgPkt.setReading( val );
Surge.sendPacket( sgPkt );
Clock.startTimeout( 2048 );
}
else if (Select.eventOccurred( eVector, Events.RADIO RECV)) {
handleRadioEvent( sgPkt ); // if base, forward to uart
}
}
Jiwon Hahn, UC Irvine
26
Problem Statements
1. Find the best schedule and buffer mapping
that minimizes the buffer size requirement
Goal-oriented
Previous work
2. Find the best schedule and buffer mapping
that fits into, and maximizes the utilization of
a given memory size
Constraint-driven
Novel
Practical
Jiwon Hahn, UC Irvine
27
Buffer Mapping Problem
▶ Spatial representation
Token-lifetime chart (t-chart)
row: token’s lifetime, produced placed consumed
column: fixed number of token changes caused by firing event
local
buffer
x
t2
t2
t1
t1
y
A
Jiwon Hahn, UC Irvine
t2
t4
t4
t3
t3
t3
B
B
C
t4
C
time
28
Buffer Mapping Problem
▶ Spatial representation (cont’d)
Memory-usage profile (m-profile)
memory
A
B
B
C
C
time
Metrics
Msize = 4, Mtotal = 20, Mused = 11, Mwasted = 9, Mutil = 55%
T=5
Jiwon Hahn, UC Irvine
29
Related Work
▶ Data memory optimization based on MoC
Technique
Group
Idea
Optimal Scheduling
[Bhattacharyya et al] in
Ptolemy Group
Buffer minimized by optimal
scheduling, optimize each local
buffer
Buffer sharing by
lifetime analysis
[Bhattacharyya et al] in
Ptolemy Group, [Ha et al] in
PeaCE group, [Ritz et al] in
Meyr Group
Local buffer lifetime is analyzed to
share global buffers
Buffer merging
[Bhattacharyya et al] in
Ptolemy Group
Input/output buffer is shared (finer
grain than buffer sharing)
Model checking
[Geilan et al] in Eindhoven
Univ.
Reduced the problem to a modelchecking problem on the state-space
of SDF graph
Etc. (MBRO, PAPS,
MRSP, …)
[Govindarajan et al] in Gao
Group, [Peperstraete et al],
[Goddard et al], [Ade et al] in
GRAPE group
Rate-optimal / Vectorization/
Application to real-time systems / etc
Jiwon Hahn, UC Irvine
30
Memory Optimization Techniques
1) *Scheduling w/ Unshared Buffer
2) *Buffer Sharing
3) *I/O Buffer Merging
4a) **Fractionizing
4b) Rate Selection (new)
5) Pipelining (new)
* Well established previous work
** Recently proposed
Jiwon Hahn, UC Irvine
31
Memory Optimization Techniques
▶ 1) Scheduling with unshared buffer
A
x
2
1
Schedule 1: A B B C C
x = A()
repeat 2:
y = B(x)
repeat 2:
C(y)
x[0..1] = A()
y[0] = B(x[0])
y[1] = B(x[1])
C(y[0])
C(y[1])
B
y
1
1
C
Schedule 2: A B C B C
x = A()
repeat 2:
y = B(x)
C(y)
x[0..1] = A()
y[0] = B(x[0])
C(y[0])
y[0] = B(x[1])
C(y[0])
Buffer requirement:
Buffer requirement:
|x| + |y| = 2 + 2 = 4
|a| + |b| = 2 + 1 = 3
By efficient ordering of actors, buffer requirement is reduced!
Each edge is directly mapped to its dedicated buffer space
Jiwon Hahn, UC Irvine
32
Memory Optimization Techniques
▶ Comparing 1), 2), 3)
x = A() Assuming the
Use the samerepeat 2: token is
y = B(x)
2 1
1 1 space for the
consumed
repeat
2:
input/output
before output is
Schedule:
A
B
B
C
C
C(y)
Reuse the
Data
tokens
produced…
available
consumed…
x[0..1] = A() space!x[0..1] = A()
x[0..1] = A()
y[0] = B(x[0])
y[0] = B(x[0])
x[0] = B(x[0])
y[1] = B(x[1])
x[1] = B(x[1])
x[0] = B(x[1])
C(y[0])
C(y[0])
C(x[0])
C(y[1])
C(x[0])
C(x[1])
A
x
B
y
C
1) Unshared Buffer
2) Shared Buffer
Buffer requirement:
Buffer requirement:
Buffer requirement:
|x| + |y| = 2 + 2 = 4
|x| + |y| = 2 + 1 = 3
|x| + |y| = 2 + 0 = 2
Jiwon Hahn, UC Irvine
3) Merged I/O Buffer
33
Memory Optimization Techniques
▶ Comparing 1), 2), 3) (cont’d)
1) Unshared Buffer
|x|+|y|
Mtotal
Mused
Mwasted
Mutil
local
buffer
x
2) Shared Buffer
t2
t2
t1
t1
y
:
:
:
:
:
t1 t3 t3
A
Jiwon Hahn, UC Irvine
B
4
20
11
9
55%
3) Merged I/O Buffer
3
15
11
4
73%
2
10
9
1
90%
t2
t2 t4t4
t4
t3
t3
B
C
t4
C
time
34
Memory Optimization Techniques
▶ 4a) Fractionizing
Idea: w
1
A
x
3
1
B
Schedule: A 3(B)
w
A’
x
1/3
1 1
Schedule: 2(AB)
B
Don’t wait until A produces big chunk of data
Modify actor A to process only fractional amount of the
original data at a time
Trade-off
Local effect
Possible time and energy overhead
e.g., resource’s access time, packet overhead
Global effect
Reduced bottleneck: shorter processing interval of A
Reduced buffer size: min|x|: 2 1
Jiwon Hahn, UC Irvine
35
Memory Optimization Techniques
▶ 4b) Rate Selection
Idea
w
Generalize fractionizing (1,3)
A
x
(2,6) (4,4)
B
Schedule1: 2(A)B
Schedule2: AB
Schedule3: 2(A)3(B)
Not only allow fractions but also multiples
Rate is defined as range, but fixed before schedule finalizes
Each actor is modeled with timing and power function with
respect to the I/O range
Benefits
Combines the power of flexibility and static determinism
Increases buffer reduction opportunity
Challenge
Need an efficient way to handle considerably increased
exploration space at runtime
Jiwon Hahn, UC Irvine
36
Memory Optimization Techniques
▶ 5) Pipelining
Idea
Allow multiple actor firing at once
Benefits
Reduced buffer requirement
Higher memory utilization
Increased throughput
Challenges
Need multiprocessors
Need to resolve resource conflict
Need to consider synchronization problem
Jiwon Hahn, UC Irvine
37
Memory Optimization Techniques
▶ Comparing 1), 4), 5)
1) Unshared Buffer
x
t2
t1
1
t2
A
1
1
C
t2
Utilization: 66.7% 100%
t3
t4
t4
4 firing unit
B Time: 5 C
B
1/2
x
t1
t1
y
t4
t2
t2
B
C
Jiwon Hahn, UC Irvine
1
t1
4) Pipelined
5)
Fractionized / Rate Selected
CA
2
B
y
Buffer Size: 33% reduction
t3
y
t2
A
x
A’
t3
A
C
x
1
B
1
y
1
1
C
t3
t4
t4
B
C
38
Memory Optimization Techniques
▶ Summary
0
1
1+2
1+2+3
1+4
1+2+4
1+2+3+4
1+4+5
M_size
4
3
3
2
2
2
1
2
M_used
11
10
10
9
8
8
6
8
M_wasted
9
5
5
1
4
4
0
0
T
5
5
5
5
6
6
6
4
55%
66.7%
66.7%
90%
66.7%
66.7%
100%
100%
M_utilization
0: None (baseline) 1: Unshared Scheduling 2: Shared Buffer
3: Merged I/O
4: Fractionized
5: Pipelined
global
t1
t1
t3
t3
t41
t1t2t2
t2
t43
A
B
C
A
Jiwon Hahn, UC Irvine
t3 t4
B
t4
C
39
Multi-metric Optimization
Trade-offs
In actor point of view (local),
processing large amount of
data at once tends to reduce
time and energy overhead
In SDF-flow point of view
(global), processing small
amount of data at once
reduces buffer requirement
Energy
Data
Memory
Execution
Time
Goal
Find a pareto-optimal point that
resides in a range of solution
set that satisfies constraints
Jiwon Hahn, UC Irvine
data-flow
rate
40
Applying it to Rappit
▶ Quasi-static optimization
Rappit Flow
Host
Compile-time
Compile
Kernel and primitives
compiled and installed
Load script
SDF defined
Preprocess
Actor-to-processor assignment,
Actor ordering (scheduling),
Buffer mapping
Load script code
Static schedule loaded
Execute
Deterministic execution
w/o runtime overhead
Run-time
Target
Jiwon Hahn, UC Irvine
Performed Tasks
Optimization
41
Outline
Scripting Framework
Memory-oriented Optimization
▶ Implementation
⊳ Synthesis Tool
⊳ Simulator
⊳ Runtime Host-assisting Tool (GUI)
Experimental Platforms
Summary & Research Plan
Jiwon Hahn, UC Irvine
42
Implementation
▶ Scripting engine synthesis tool
System Template
GUI-based check-box approach
easily capture existing systems
model new systems for simulation and design
space exploration
includes communication description
Component Library
binds according to template configuration
consists of MCU, on-chip devices, off-chip
peripherals
each component has I/O pins and driver modules
Jiwon Hahn, UC Irvine
43
Implementation
▶ Memory simulator
Jiwon Hahn, UC Irvine
44
Implementation
▶ Interactive runtime tool
Jiwon Hahn, UC Irvine
45
Implementation
▶ Tool integration
Node 1
Parser
Dispatcher
GUI
Node 2
Scheduler
Memory
Optimizer
Node
Manager
Node 3
Node N
Jiwon Hahn, UC Irvine
46
Outline
▶
Scripting Framework
Memory-oriented Optimization
Implementation
Experimental Platforms
Summary & Research Plan
Jiwon Hahn, UC Irvine
47
HW Platforms and Real-world Applications
Eco
ultra-compact sensor node
pre-term infant monitoring
dancing motion detection
Mini-FDPM
active laser sensing device
breast cancer detection
DuraNode
real-time data acquisition system
structural health monitoring
Butterfly
low-power, i/o rich development board
prototyping (SD-card, speaker, sensors, RF)
Jiwon Hahn, UC Irvine
48
Outline
▶
Scripting Framework
Memory-oriented Optimization
Implementation
Experimental Platforms
Summary & Research Plan
Jiwon Hahn, UC Irvine
49
Summary
A novel scripting framework for embedded
systems
Scripting engine synthesis
Host assisting runtime environment
Memory optimization techniques
Comparison of techniques
Integration and multi-objective problem
Tool Implementations
Rappit GUI, memory simulator
Jiwon Hahn, UC Irvine
50
Contributions
Empowered Embedded Systems
Unleashing the severely constrained embedded
systems
SDF Extensions
Extension of SDF model
Extending the application area of SDF
Memory Savings
Reduced memory requirement by integration of
policies, including new techniques
Jiwon Hahn, UC Irvine
51
Research Plan
▶ finished, ongoing, future work
Framework
Language definition*
Initial implementation and
prototyping
Component library
generation*
Code generation
Overhead analysis
Tool integration
Test on multinode scenario
Optimization
Survey and comparison
Simulator implementation
Integrating techniques
SDF extension on rate
Rate-selection algorithm
Buffer-mapping protocol
Cost function modeling of
multi-metric optimization
SDF extension on timing
Case Study
*with Qiang Xie & Jinfeng Liu
Jiwon Hahn, UC Irvine
AVR butterfly
mini-FDPM
eco
DuraNode
52
Publications
Jiwon Hahn, Qiang Xie, and Pai H. Chou, Rappit: A
Framework for the Synthesis of Host-Assisted LightWeight Scripting Engines for Adaptive Embedded
Systems, in Proc. International Conference on
Hardware Software Codesign and System Synthesis
(CODES+ISSS), 2005.
Jiwon Hahn, Dexin Li, Qiang Xie, Pai H. Chou, Nader
Bagherzadeh, David W. Jensen, Alan C. Tribble, Power
Reduction in JTRS Radios with ImpacctPro," in Proc.
IEEE Military Communication Conference (MILCOM),
2004.
Jiwon Hahn, UC Irvine
53
Bibliography
Murthy PK, Shuvra S. Bhattacharyya, Buffer merging - a powerful technique for reducing
memory requirements of synchronous dataflow specifications. ACM Transactions on Design
Automation of Electronic Systems (TODAES), 2004.
Murthy PK, Shuvra S. Bhattacharyya, Shared buffer implementations of signal processing
systems using lifetime analysis techniques, IEEE Transactions on Computer-Aided Design of
Integrated Circuits & Systems (TCADICS), 2001.
Shuvra S. Bhattacharyya., Murthy PK, Edward A. Lee, APGAN and RPMC: Complementary
Heuristics for Translating DSP Block Diagrams into Efficient Software Implementations, Design
Automation for Embedded Systems (DAES), 1997
Shuvra S. Bhattacharyya, Murthy PK, Edward A. Lee, Joint Minimization of Code and Data for
Synchronous Dataflow Programs, 1997.
Hyunok Oh, Soonhoi Ha, Fractional rate dataflow model and efficient code synthesis for
multimedia applications, SIGPLAN Not, 2002.
Hyunok Oh, Soonhoi Ha, Data memory minimization by sharing large size buffers, Asia and
South Pacific Design Automation Conference (ASPDAC), 2000.
Hyunok Oh, Soonhoi Ha, Efficient Code synthesis from extended dataflow graphs for multimedia
applications, Design Automation Conference (DAC), 2002.
Geilen M, Basten T, Stuijk S, Minimising buffer requirements of synchronous dataflow graphs
with model checking, 42nd Design Automation Conference (DAC), 2005.
Eckart Zitzler and Jurgen Teich and Shuvra S. Bhattacharyya, Multidimensional Exploration of
Software Implementations for DSP Algorithms, Journal of VLSI Signal Processing (JVLSI), 1999
John K. Ousterhout, Scripting: Higher Level Programming for the 21st Century, IEEE Computer
magazine, 1998
TecO Home, http://particle.teco.edu/
Jiwon Hahn, UC Irvine
54
Acknowledgements
This work is sponsored in part by the National
Science Foundation grant CCR-0205712 and
NSF CAREER Award CNS-0448668
Professor Pai Chou
Qiang Xie
Jinfeng Liu
Jiwon Hahn, UC Irvine
55
Backup Slides
Jiwon Hahn, UC Irvine
56
Scripting Overhead
Scripting for General Purpose Computers
Assume unlimited resources
Full feature scripting engine for convenience
Slower than system programming language
Scripting for Embedded Systems
Limited memory, CPU, power, …
Need scripting engine optimization
Host assist
Language subsetting
Library subsetting
Efficient memory usage
Scripting may be even faster than compiled code!
Jiwon Hahn, UC Irvine
57
Rappit
▶ Packet format example
Command Packet Format
Dst.
Msg ID
Opcode
Input[3]
Output[3]
CRC
Command Message Format
Opcode
In_addr
In_start
In_size
Out_addr
Out_start
Out_size
Response Packet Format
Src.
Msg ID
Msg Type
Data Type
Payload
CRC
EOP
Response Message Format
Jiwon Hahn, UC Irvine
58
Rappit
▶ Scripting engine optimization in code synthesis
Language subsetting
eg., assignment (=), loop (repeat)
Library subsetting
customized for target applications and platform
Full-Featured
Component Library
MCU
Interrupts RF
RF
Dataflash
SPI UART GPIO Interrupts ADC
LCD
Jiwon Hahn, UC Irvine
Joystick
Sensor1 Sensor2
GPIO
ADC
UART
Sensor1
59
Memory Organizations
▶ Comparing previous work and Rappit
Previous approaches consider both data and code memory
minimization, but prioritize code size*
We mainly focus on data size** minimization
On-chip Flash
or EEPROM
RAM
RAM
Application
Code*
Buffer
Previous work
Jiwon Hahn, UC Irvine
On-chip Flash
or EEPROM
Script Code
Primitives
Buffer **
Rappit
Kernel
Data Flash
Our work
60
Rappit
▶ Code size of runtime components
Host Code (.py)
Lines
Size (KB)
MCU Code (.c)
Lines
GUI
644
21.8
Cmd
127
Parser &
Msg Generator
2.87
Interpreter
260
-
221
4.97
Primitives
90
-
Library
263
6.396
300
-
Packetizer &
Depacketizer
82
2.0
Packetizer &
Depacketizer
Total
750
1.484
Packet Mgr
42
0.92
Total
1379
38.96
Jiwon Hahn, UC Irvine
Size
(KB)
61
Rappit
▶ Summary of results
Code size reduction
Application
Native
Rappit
Reduction
Reg setting
4.356 KB
1.664 KB
61.8%
LCD usage
12.45 KB
4.2 KB
66.3%
Performance overhead components analysis
Native
Interactive
Batch
Communication
1
3
1
RAM Access
3
1
1
ROM Access
3
1
1
1: fast
Packetization
1
2
2
2: tolerable
Interpretation
1
2
2
3: slow
Total cmd/sec
92
4.75
111
Jiwon Hahn, UC Irvine
(bottleneck)
62
Rappit
▶ Subset of primitives
Device
Primitive
Device
Primitive
Device
Primitive
MCU
reset
GPIO
set pin
Timer
register fcn
MCU
power save
GPIO
get pin
Timer
remove fcn
MCU
initialize
GPIO
clear pin
RTC
set clock
MCU
get sys clock
USART
TX
RTC
read clock
MCU
set sys clock
USART
RX
LCD
clear
RF
INIT
SD
read
LCD
write
RF
set channel
SD
write
LCD
set contrast
RF
set power
ADC
read
Joystick
get key
RF
set frequency
Sensor1
read
Speaker
set volume
RF
send
Sensor2
read
Speaker
play tone
RF
receive
Sensor3
read
Speaker
play song
Jiwon Hahn, UC Irvine
63
Rappit
▶ Language
key
Usage
Example
import
import methods of each device
from RF import *
doc, dict
look up documentation, included
methods
RF.__doc__
RF.__dict__
open, close
open/close a connection to a target
system
node1 = open(MCU1, uart1)
node1.close()
ls
list all connected instances
ls
every,
start, stop
schedule events with certain period
s1 = (every 30ms: a+=
ADC1.read()); s1.start();
s1.stop()
repeat
looping
def
define of a function with a series of
methods
repeat 3:
SD.write(a)
def readTemperature():
...
=, +
assign/configure or add value
a = SD.read(10); a+=SD.read(20)
Jiwon Hahn, UC Irvine
64
SDF
▶ Strength and limitations
Strength
Ability to express multi-rate systems, parallelism
Deadlock detection and scheduling can be determined at
compile-time
Bounded memory requirements
No runtime supervisory overhead
Limitations
Lack of conditional control flow
Does not model asynchronous nodes
Does not adequately address the real-time nature of
connections to the outside world
Does not address data-dependent run times
Jiwon Hahn, UC Irvine
65
Superset of SDF
▶ Dynamic dataflow (DDF)
Allows asynchronous actors with non-fixed rate
of each actor
Captures dynamic constructs
if/else
for-loop
do/while loop
recursion
Jiwon Hahn, UC Irvine
66
SDF
▶ Notations
Firing & Tokens
f(n) : nth firing vector
tk(n) : number of live tokens after nth firing
tk(n+1) = tk(n) + G · f(n)
f = n=0T f(n) : firing frequency
q = fmin : firing vector (minimum # of firings)
q(src(ei)) x p(src(ei)) = q(snk(ei)) x c(snk(ei)) balance equation
Consistent SDF
rank (G) = |N|-1
G·q=0
Scheduling
Given G, tk(0), and q, find a firing order which satisfies tk(n) >= 0,
and q = n=0T f(n)
Deadlocked if no node can be fired before reaching q = n=0T f(n)
Jiwon Hahn, UC Irvine
67
SDF
▶ Our extensions
SDF previously used in multimedia-oriented
applications targeting DSPs and FPGAs
To target more general types of applications,
non-buffered edges (dummy channels) should
be added, which only denotes precedence
The produce/consume rate of each actor is not
given as fixed, but as a range
Add timing (future work)
Jiwon Hahn, UC Irvine
68
SDF
▶ Another example
Extended Surge Application
C
A
ADC
read
a
1 10
c
B
b
SD
store
D
d
1 3
SD
read
E
e
10
Kernel
1 pack 1
F
f
1
RF
send
LCD
show
Valid Schedules:
30(A) 3(B) 3(C) D 10(E) 10(F)
3 (10(A) BC) D 10(EF)
30(A) 2(BC) BCD 10(EF)
Jiwon Hahn, UC Irvine
– Flat SAS
– SAS
– Non SAS
69
SDF
▶ Another example (cont’d)
Script (SAS)
enable Timer1, RF, SD, LCD
every 2048:
repeat 10:
repeat 10:
a = ADC.read()
LCD.show(a)
SD.store(a)
repeat 10:
b = SD.read()
repeat 3:
c = Kernel.pack(b)
RF.send(c)
Jiwon Hahn, UC Irvine
70
Script-to-SDF Transform
User script
x = A()
repeat 2:
y = B(x)
C(y)
V = { A, B, C }
E = { x, y } = {eAB, eBC}
πinit = A2(BC)
eAB p (A) = (2, 3)
c (B) = (1,1)
eBC p (B) = (1,1)
c (C) = (1,2)
Jiwon Hahn, UC Irvine
A
x
2/3 1/1
1/2
B
y
1/1
C
71
Multimetric Optimization
▶ Cost function modeling
Constraints
Energy
Battery lifetime or other source of power budget
Time
Deadline in given real-time application
Memory
Given memory size for a platform
Each node is modeled with:
Pv(c,p): power consumption w.r.t. consume/produce rate (i.e.,
input/output data size)
Tv(c,p): execution delay w.r.t. consume/produce rate
Jiwon Hahn, UC Irvine
72