Transcript Slides
‘C’ for Microcontrollers,
Just Being Efficient
Lloyd Moore, President
[email protected]
www.CyberData-Robotics.com
Agenda
Microcontroller Resources
Knowing Your Environment
Memory Usage
Code Structure
Interrupts
Math Tricks
Optimization
Disclaimer
Some microcontroller techniques necessarily
need to trade one benefit for another –
typically lower resource usage for
maintainability
Point of this presentation is to point out various
techniques that can be used as needed
Use these suggestions when necessary
Feel free to suggest better solutions as we go
along
Microcontroller Resources
EVERYTHING resides on one die inside one
package: RAM, Flash, Processor, I/O
Cost is a MAJOR design consideration
Typical costs are $0.25 to $25 each (1000’s)
RAM: 16 BYTES to 32K Bytes typical
Flash/ROM: 384 BYTES to 256K Bytes
Clock Speed: 4MHz to 80MHz typical
Much lower for battery saving modes (32KHz)
Bus is 8, 16, or 32 bits wide (just like the old
days)
Other Considerations
Specialized resources often present
Portability inside families a big concern
May have hardware centric API, or just raw
registers!
No floating point hardware
Across families, not so much
Typically no operating system present
Counters, UART, USB PHY, LCD Controller
May have other math hardware (MAC, CRC)
No protected memory / MMU
Do have specialized memory segments
Power Consumption
Microcontrollers typically used in battery
operated devices
Power requirements can be
EXTREMELY tight
Energy harvesting applications
Long term battery installations (remote
controls, hard to reach devices, etc.)
EVERY instruction executed consumes
power, even if you have the time!
Know Your Environment
Traditionally we ignore hardware details
Need to tailor code to hardware available
Specialized hardware MUCH more efficient
Compilers typically have extensions
Interrupt – specifies code as being ISR
Memory model – may handle banked
memory and/or simultaneous access banks
Multiple data pointers / address generators
Debugger may use some resources
Memory Usage
Use ‘const’ to put data into program memory
Alignment / padding issues
Avoid dynamic memory allocation
Take extra space and processing time
Memory fragmentation a big issue
Use and reuse static buffers
Typically NOT an issue, non-aligned access ok
Reduces variable passing overhead
Allows for smaller / faster code due to reduced indirections
Does bring back over write bugs if not done carefully
Use the appropriate variable type
Don’t use int and double for everything!!
Affects processing time as well as storage
Char vs. Int Increment on 8051
int iX;
iX++;
char cX;
cX++;
000A
000D
000E
000F
900000
E0
04
F0
MOV
MOVX
INC
MOVX
DPTR,#cX
A,@DPTR
A
@DPTR,A
6 Bytes of Flash
4 Instruction cycles
0000
0003
0004
0007
900000
E4
75F001
120000
MOV
CLR
MOV
LCALL
DPTR,#iX
A
B,#01H
?C?IILDX
10 Bytes of Flash +
subroutine overhead
Many more than 4
instruction cycles with a
LCALL
Code Structure
Count down instead of up
Pointers vs. array notation
Saves a subtraction on all processors
DJNZ style instruction on some processors
Generally better using pointers
Bit Shifting
May not always generate what you think
May or may not have barrel shifter hardware
May or may not have logical vs. arithmetic shifts
Shifting Example
cX = cX << 3;
0006
0007
0008
0009
33
33
33
54F8
cA = 3;
cX = cX << cA;
RLC
RLC
RLC
ANL
A
A
A
A,#0F8H
Constants turn into seperate
statements
Variables turn into loops
Both of these can be one
instruction with a barrel shifter
000B
000E
000F
0010
0011
0013
0014
0016
0016
0017
0018
0018
900000
E0
FE
EF
A806
08
8002
C3
33
D8FC
MOV
DPTR,#cA
MOVX
A,@DPTR
MOV
R6,A
MOV
A,R7
MOV
R0,AR6
INC
R0
SJMP
?C0005
?C0004:
CLR
C
RLC
A
?C0005
DJNZ
R0,?C0004
More Code Structure
Actual parameters typically passed in registers if
available
Global variables
Keep function parameters to less than 3
May also be passed on stack or special parameter area
May be more efficient to pass pointer to struct
While generally frowned upon for most code can be very
helpful here
Typically ends up being a direct access
Read assembly code for critical areas
Know which optimizations are present
Small compilers do not always have common optimizations
Inline, loop unrolling, loop invariant, pointer conversion
Indexed Array vs Pointer on M8C
ucMode = g_Channels[uc_Channel].ucMode;
01DC
01DE
01E0
01E2
01E3
01E5
01E6
01E8
01E9
01EB
01EC
01EF
01F1
01F4
01F7
01FA
01FD
01FF
52FC
5300
5000
08
5100
08
5000
08
5007
08
7C0000
38FC
5F0000
5F0000
060000
0E0000
3E00
5403
mov A,[X-4]
mov [__r1],A
mov A,0
push A
mov A,[__r1]
push A
mov A,0
push A
mov A,7
push A
xcall __mul16
add SP,-4
mov [__r1],[__rX]
mov [__r0],[__rY]
add[__r1],<_g_Channels
adc[__r0],>_g_Channels
mvi A,[__r1]
mov [X+3],A
ucMode = pChannel->ucMode;
01ED
01EF
01F1
01F3
5201
5300
3E00
5405
mov
mov
mvi
mov
A,[X+1]
[__r1],A
A,[__r1]
[X+5],A
Does the same thing
Saves 29 bytes of memory AND a
call to a 16 bit multiplication routine!
Pointer version will be at least 4x
faster to execute as well, maybe 10x
Most compilers not this bad – but
you do find some!
Interrupts
Generally implemented as individual hardware vectors
with a small amount of program memory at the
location
ISR is what you get – no OS, no threads, no IST
Also very common to use interrupts to simulate
threads
Can use a flag with main loop to get IST behavior for less time
critical code
Interrupt itself take the place of the WaitFor_XXX or signal
Follows very naturally for hardware tasks and timers
Generally an “interrupt” statement provided
Interrupt Example
static unsigned char g_TimerTriggered;
void main()
{
ConfigureTimer0();
g_TimerTriggered = 0;
GlobalEnableInterrupt();
while(1)
{
if(g_TimerTriggered)
{
g_TimerTriggered = 0;
//Could also disable the timer interrupt here
DoTimerTask();
//to avoid a race condition resetting g_TimerTriggered
}
//Can put optional sleep here, interrupts can wake up processor
}
}
void Timer0ISR(void) interrupt 1 using 2
{
g_TimerTriggered = 1;
//Can put other small, quick work here
}
//Interrupt source 1, attached to vector 2
Switch Statement Implementation
Switch statements can be implemented in various
ways
Specific implementation can also vary based case
clauses
Sequential compares
In line table look up for case block
Special function with look up table
Clean sequence (1, 2, 3, 4, 5)
Gaps in sequence (1, 10, 30, 255)
Ordering of sequence (5, 4, 1, 2, 3)
Knowing which method gets implemented critical to
optimizing!
Switch Statement Example
switch(cA)
{
case 0:
cX = 4;
break;
case 1:
cX = 10;
break;
case 2:
cX = 30;
break;
default:
cX = 0;
break;
}
0006
0009
000A
000B
000C
000F
0011
0012
0014
0015
0017
0018
001A
001C
001C
001F
0021
0022
900000
E0
FF
EF
120000
0000
00
0000
01
0000
02
0000
0000
900000
7404
F0
8015
MOV
MOVX
MOV
MOV
LCALL
DW
DB
DW
DB
DW
DB
DW
DW
?C0002:
MOV
MOV
MOVX
SJMP
DPTR,#cA
A,@DPTR
R7,A
A,R7
?C?CCASE
?C0003
00H
?C0002
01H
?C0004
02H
00H
?C0005
DPTR,#cX
A,#04H
@DPTR,A
?C0006
...More blocks follow for each case
Bit Variables
Some processors have special memory
areas and op-codes for single bit storage
Saves overhead of masking operations
Some key from bit fields notation, some
need keyword (frequently ‘bit’)
struct {
unsigned int foo : 1;
} flags;
unsigned int my_bit : 1;
bit my_bit;
Math Tricks
Floating point math VERY expensive on microcontrollers
No hardware support
Typically 32 bits for float, 64 bits for double
Support provided by a BIG library
Can use fixed point math in many cases
Basically the same as integer math, however move the decimal inside the
integer.
Binary number is really:
To make a fixed point number just adjust the exponents:
2^7 + 2^6 +… 2^2 + 2^1 + 2^0
2^6 + 2^5 + … 2^1 + 2^0 + 2^-1
:Note 2^-1 = 0.5
Assume 8 bit value: Range = [0,255]
Assume one binary decimal point
XXXXXX.X
Range is now [0, 127.5]
All the internal math stays the same so long as only fixed point numbers with the
same binary point location used together!
More Math Tricks
You may not have multiply and/or divide ops!
Decomposing operations can help
X*5=X*4+X
(X * 4) can become 2 shift left operations
Formulas should also be restructured for math
available:
Y=ax^2 + bx + c : 1 Pow or Mult, 2 Mult, 2 Add
Y = x (ax + b) + c : 2 Mult, 2 Add
Lookup tables can be great for limited
domain problems
Optimization
Step 0 – Before coding anything think about
risk points and prototype unknowns!!!
Step 1 – Get it working!!
Fast but wrong is of no use to anyone
Optimization will typically reduce readability
Step 2 – Profile to know where to optimize
Usually only one or two routines are critical
You need to have specific performance metrics to
target
Optimization
Step 3 – Let the tools do as much as
they can
Turn off debugging!
Select the correct memory model
Select the correct optimization level
Step 4 – Do it manually
Read the generated code! Might be able to
make a simple code or structure change.
Last – think about assembly coding
Summary
Microcontroller hardware is much simpler
than most of us are used to
Be familiar with the hardware in your
microcontroller
Be familiar with your compiler options
and how it translates your code
For time or space critical code look at the
assembly listing from time to time
Questions?