ARM7TDMI Processor Introduction

Download Report

Transcript ARM7TDMI Processor Introduction

ARM Processor Introduction
ARM Architecture
Key Features
Data Size and Instruction Sets
 ARM is a 32-bit RISC architecture
 (Reduced Instruction Set Computer)
 ARM uses a 32-bit load/store architecture
 When used in relation to the ARM:
 Byte means 8-bit
 Halfword means 16-bit (two bytes)
 Word means 32-bit (four bytes)
 Most ARM’s implement two instruction sets
 32-bit ARM Instruction Set
 16-bit Thumb Instruction Set
17/07/2015
3
ARM Operating States
 The ARM processor has two operating states:
 ARM state which executes 32-bit, word aligned ARM instructions
 THUMB state which can execute 16-bit, halfword aligned THUMB
instructions
 Switching state
 Entering THUMB state
- BX instruction with the state bit (bit 0) set in the operand register.
- Automatically on return from an exception (IRQ, FIQ, ABORT, SWI,…), if the
exception was entered with the processor in THUMB state.
 Entering ARM state
- BX instruction with the state bit clear in the operand register.
- Automatically on the processor taking an exception. In this case, the PC is
placed in the exception mode’s link register.
17/07/2015
4
ARM Operating Modes
 The ARM supports seven modes of operation:
 User (usr): The normal ARM program execution state
 FIQ (fiq): Designed to support a data transfer or channel process
 IRQ (irq): Used for general-purpose interrupt handling
 Supervisor (svc): Protected mode for the operating system
 Abort mode (abt): Entered after a data or instruction prefetch abort
 System (sys): A privileged user mode for the operating system
 Undefined (und): Entered when an undefined instruction is executed
 Mode changes may be made under software control, or may be
brought about by external interrupts or exception processing.
 Most application programs will execute in User mode. The non-user
modes' known as privileged modes-are entered in order to service
interrupts or exceptions, or to access protected resources.
17/07/2015
5
Registers
 37 32-bits long registers for 7 processor modes
 18 visible 32-bit registers in privileged modes (17 in user mode)
 r0-r13 = general purpose registers (r13 = Stack Pointer (SP))
 r14 = Link Register (LR)
 r15 = Program Counter (PC)
 CPSR = Current Program Status Register
 SPSR = Saved Program Status Register (only accessible in privileged
mode)
 Availability of the register banks depends on the current
processor mode (user, supervisor and others)
 Maximum of 13 general purpose registers in user mode (r0-r12)
 In high priority interrupts (FIQ), a different bank with general purpose
registers becomes available
 One set of xPSR, SP and LR are available for each processor mode
17/07/2015
6
Register Organization Summary
User / System
FIQ
r0
r0
r1
Low Registers
High Registers
Supervisor
Abort
IRQ
Undefined
r0
r0
r0
r0
r1
r1
r1
r1
r1
r2
r2
r2
r2
r2
r2
r3
r3
r3
r3
r3
r3
r4
r4
r4
r4
r4
r4
r5
r5
r5
r5
r5
r5
r6
r6
r6
r6
r6
r6
r7
r7
r7
r7
r7
r7
r8
r8_fiq
r8
r8
r8
r8
r9
r9_fiq
r9
r9
r9
r9
r10
r10_fiq
r10
r10
r10
r11
r11_fiq
r11
r11
r10
r11
r12
r12_fiq
r12
r12
r12
r12
r13 (sp)
r13_fiq
r13_svc
r13_abt
r13_irq
r13_undef
r14 (lr)
r14_fiq
r14_svc
r14_abt
r14_irq
r14_undef
r15 (pc)
r15 (pc)
r15 (pc)
r15 (pc)
r15 (pc)
r15 (pc)
r11
Program Status Registers
cpsr
cpsr
sprsr_fiq
spsr_fiq
cpsr
spsr_svc
cpsr
spsr_abt
cpsr
spsr_irq
sprsr_fiq
cpsr
sprsr_fiq
spsr_undef
17/07/2015
7
THUMB State Registers Set
17/07/2015
8
Relationship between ARM and
THUMB state registers
 The THUMB state registers relate to the ARM state
registers in the following way:
17/07/2015
9
Program Status Registers (1/3)
 The ARM contains a Current Program Status Register
(CPSR), plus five Saved Program Status Registers
(SPSRs) for use by exception handlers.
 These register's functions are:
 Hold information about the most recently performed ALU
operation
 Control the enabling and disabling of interrupts
 Set the processor operating mode
17/07/2015
10
Program Status Registers (2/3)
 Condition Code Flags
 The N, Z, C and V bits may be changed as a result of arithmetic and
logical operations, and may be tested to determine whether an instruction
should be executed
- In ARM state, all instructions may be executed conditionally.
- In THUMB state, only the Branch instruction is capable of conditional
execution.
 Control Bits
 The I, F, T and M[4:0]) bits will be changed when an exception arises. If
the processor is operating in a privileged mode, they can also be
manipulated by software.
 T bit:
- This reflects the operating state. When this bit is set, the processor is
executing in THUMB state, otherwise it is executing in ARM state. This is
reflected on the TBIT external signal.
- Note that the software must never change the state of the TBIT in the CPSR. If
this happens, the processor will enter an unpredictable state.
17/07/2015
11
Program Status Registers (3/3)
 Control Bits
 Interrupt disable bits:
- The I and F bits are the interrupt disable bits. When set, these
disable the IRQ and FIQ interrupts respectively.
 Mode bits:
- The M4, M3, M2, M1 and M0 bits (M[4:0]) are the mode bits. These
determine the processor's operating mode. Not all combinations of
the mode bits define a valid processor mode. Only those explicitly
described shall be used. The user should be aware that if any illegal
value is programmed into the mode bits, M[4:0], then the processor
will enter an unrecoverable state. If this occurs, reset should be
applied.
17/07/2015
12
Hardware Interrupts
 ARM cores do not include an interrupt controller to
support and distinguish many interrupt sources
 A memory-mapped interrupt controller is required for devices with
many peripherals / interrupt sources
 There are two levels for hardware interrupts, both levels
get access to their own SP and LR registers and a copy of
the original Program Status Register
 FIQ – High priority (fast) hardware interrupt
- Disables all other interrupts
- Provides a new bank with 5 general purpose registers

IRQ – Regular hardware interrupt
17/07/2015
13
Exceptions (1/6)
 Exceptions arise whenever the normal flow of a program
has to be halted temporarily
 For example to service an interrupt from a peripheral.
 ARM supports 7 types of exception and has a privileged
processor mode for each type of exception.
 ARM Exception vectors
Address
Exception
Mode in Entry
0x00000000
Reset
Supervisor
0x00000004
Undefined instruction
Undefined
0x00000008
Software Interrupt
Supervisor
0x0000000C
Abort (prefetch)
Abort
0x00000010
Abort (data)
Abort
0x00000014
Reserved
Reserved
0x00000018
IRQ
IRQ
0x0000001C
FIQ
FIQ
17/07/2015
14
Exceptions (2/6)
 When handling an exception, the ARM:
 Preserves the address of the next instruction in the appropriate
Link Register
 Copies the CPSR into the appropriate SPSR
 Forces the CPSR mode bits to a value which depends on the
exception
 Forces the PC to fetch the next instruction from the relevant
exception vector
 It may also set the interrupt disable flags to prevent otherwise
unmanageable nestings of exceptions.
 If the processor is in THUMB state when an exception occurs, it
will automatically switch into ARM state when the PC is loaded
with the exception vector address.
17/07/2015
15
Exceptions (3/6)
 On completion, the exception handler:
 Moves the Link Register, minus an offset where appropriate, to
the PC. (The offset will vary depending on the type of exception.)
 Copies the SPSR back to the CPSR
 Clears the interrupt disable flags, if they were set on entry
17/07/2015
16
Exceptions (4/6)
 Reset
 When the processor’s Reset input is asserted
- CPSR  Supervisor + I + F
- PC  0x00000000
 Undefined Instruction
 If an attempt is made to execute an instruction that is undefined
- LR_undef  Undefined Instruction Address + #4
- PC  0x00000004, CPSR  Undefined + I
- Return with : MOVS pc, lr
 Prefetch Abort
 Instruction fetch memory abort, invalid fetched instruction
- LR_abt  Aborted Instruction Address + #4, SPSR_abt  CPSR
- PC  0x0000000C, CPSR  Abort + I
- Return with : SUBS pc, lr, #4
17/07/2015
17
Exceptions (5/6)
 Data Abort
 Data access memory abort, invalid data
- LR_abt  Aborted Instruction + #8, SPSR_abt  CPSR
- PC  0x00000010, CPSR  Abort + I
- Return with : SUBS pc, lr, #4 or SUBS pc, lr, #8
 Software Interrupt
 Enters Supervisor mode
- LR_svc  SWI Address + #4, SPSR_svc  CPSR
- PC  0x00000008, CPSR  Supervisor + I
- Return with : MOVS pc, lr
17/07/2015
18
Exceptions (6/6)
 Interrupt Request
 Externally generated by asserting the processor’s IRQ input
- LR_irq  PC - #4, SPSR_irq  CPSR
- PC  0x00000018, CPSR  Interrupt + I
- Return with : SUBS pc, lr, #4
 Fast Interrupt Request
 Externally generated by asserting the processor’s FIQ input
- LR_fiq  PC - #4, SPSR_fiq  CPSR
- PC  0x0000001C, CPSR  Fast Interrupt + I + F
- Return with : SUBS pc, lr, #4
- Handler @0x1C speeds up the response time
17/07/2015
19
Memory Organization
 ARM derivatives up to ARM7 are based on the von
Neumann model
 Shared, single memory space for code AND data
 Linear 32-bit address space (4 GByte)
 ARM derivatives of ARM9 and up support the Harvard
model
 Separated memory ports for code AND data
 Offers simultaneous access to code AND data
17/07/2015
20
The 3-stage Instruction Pipeline
 Up to the ARM7, ARM processors have a 3-stage
instruction pipeline
Instruction
Fetch
ThumbARM ARM decode
decompress Reg Select
FETCH
DECODE
Reg
Read
Shift
ALU
Reg
Write
EXECUTE
 The three stages are:
 1.Fetch
- Fetching an instruction from the memory containing the code
 2.Decode
- Decoding the instruction and prepare data path control
signals for next cycle
 3.Execute
- The instruction gets executed on the data path specified and
the result is written back to the destination
17/07/2015
21
Execution Of Single Cycle Instructions
 It takes 3 cycles to completely process an instruction
 However: Once the pipeline is filled, one instruction
becomes executed every cycle
Fetch
Instruction 3
1
2
Execute
Decode
Instruction 2
1
Execute
Decode
Instruction 3
1
Fetch
Instruction 1
Time
Fetch
Instruction 2
Decode
Instruction 1
Fetch
Instruction 3
Decode
Instruction 2
Execute
Instruction 1
17/07/2015
22
Execution Of Multi Cycle Instructions
 The pipeline will be halted/delayed for one cycle if
multiple memory accesses have to be made in order to
execute the instruction
 For example any instruction requiring access to an operand
stored in memory (and not a register)
 NOTE: Branch instructions flush and refill the pipeline
 Upon execution of a branch instruction, the current fetch and
decode actions of the pipeline are aborted and a new fetch from
the branch location gets started
17/07/2015
23
The 5-stage Instruction Pipeline
 Higher performance ARM derivatives use a 5-stage
pipeline to compensate for the memory access bottleneck
of the 3-stage pipeline
 The five stages are:
 1.Fetch: Fetch next instruction from memory
 2.Decode: Decode instruction and read register operands
 3.Execute: Execute instruction
 4.Data: Access data memory, if required
 5.Write-back: Write the result of the instruction back to the
destination memory location
17/07/2015
24
Pipelining Comparison
 ARM7TDMI
Instruction
Fetch
ThumbARM ARM decode
decompress Reg Select
FETCH
DECODE
Reg
Read
Shift
ALU
Reg
Write
EXECUTE
 ARM9TDMI
Instruction
Fetch
ARM or Thumb
Inst Decode
Reg
Reg
Decode Read
Shift + ALU
Memory
Access
Reg
Write
FETCH
DECODE
EXECUTE
MEMORY
WRITE
17/07/2015
25
Reducing Code Size: Thumb
 Compressed subset of the 32-bit ARM instruction set
 Require lower bus bandwidth from narrow external memory
 Improves already outstanding code density
 A Thumb enabled ARM:
 Executes both 32-bit ARM and 16-bit Thumb instructions
 Allows runtime interworking between ARM and Thumb code
 State change performed via branch with exchange (BX)
instruction
 Thumb reduces 32-bit system to 16-bit cost
 Consumes less power
 Requires less external memory
17/07/2015
26
ARM and Thumb Performance
 Thumb programs typically are:
 30% smaller than ARM programs
 30% faster when accessing 16-bit memory
MIPS
MHz
17/07/2015
27
ARM-Based System
16-bit RAM
32-bit RAM
Interrupt
Controller
nIRQ
8-bit ROM
Peripherals
I/O
nFIQ
ARM
Core
17/07/2015
28
Advanced Microcontroller Bus Architecture
 AMBA was introduced in 1996 and is widely used as the
on-chip bus for ARM processors
 AMBA is an open standard that describes the
interconnection and management of functional blocks that
makes up a System On chip
17/07/2015
29
Advanced System Bus (ASB)
 First Generation of AMBA system bus
 Implements features required for high performance
 Multiple Bus Masters
 Optimizes system performance by sharing resources between
different bus masters such as the CPU, DMA controller, etc.
 Pipelined and burst transfers
 Allows high speed memory and peripheral access without the
requirement for additional cycles on the bus
17/07/2015
30
Advanced High-Performance Bus (AHB)
 Multiple Bus Masters
 Optimizes system performance by sharing resources
between different bus masters such as the CPU, DMA
controller, etc.
 Pipelined and burst transfers
 Allows high speed memory and peripheral access without
the requirement for additional cycles on the bus
 Split transactions supported
 Enables high latency slaves to release the system bus while
completing a transaction
17/07/2015
31
Advanced Peripheral Bus (APB)
 Ideal for general purpose peripherals such as timers,
UARTs, IOs, etc.
 Simple bus
 Non-pipelined architecture
 Easy to implement with all peripherals acting as slaves
 Simpler interface means low gate count
 Low power
 Isolated peripherals behind the bridge reduces load on the main
system bus
17/07/2015
32
Atmel AT91
Architecture
AT91 Architecture
 The Atmel AT91 Series of microcontrollers are based upon
the powerful ARM7TDMI or ARM920T processors
 New products are based upon the powerful ARM926EJ-S
processor
 Atmel has taken these cores, added a wide range of
peripherals and advanced power management systems, to
give the design engineer the best of both worlds – a high
performance peripheral set with very low power
consumption
 It gives the buyer a 32-bit processor at 16-bit cost!
17/07/2015
34
ARM Cores used by Atmel
ARM7TDMI
30 MIPS @ 33 MHz in 0.35 um or
60 MIPS @ 66 MHz in 0.18 um.
-40 to 85 drg C
AT91 series
SAM7S, 7A, 7X, 7SE, 7L
ARM7 ASSP & ASICs
ARM920T
200 MIPS @ 180 MHz in 0.18 um
16KB Instruction and 16KB Data Cache, MMU,
ETM
-40 to 85 drg C
ARM926EJ-S
200 MIPS @ 180 MHz in 0.13 um
Extended DSP instructions, JAVA,
MMU, 16KB Instruction and
16KB Data Cache, TCM,
-40 to 85 drg C
AT91RM9200
SAM9260
SAM9261
SAM9262
SAM9263
ARM9 ASSP & ASICs
ARM946E-S
130 MIPS @ 120 MHz in 0.18 um
Extended DSP instructions
Configurable instruction and
data cache
-40 to 85 drg C
ASICs
17/07/2015
35
ARM7
 Established, high-volume 32-bit RISC Architecture
 Small die size and very low power consumption
 3-stage instruction pipeline
 Von Neumann memory layout with linear 32-bit address
space (4GByte)
 32-bit data bus
 Supports little- and big-endian
 Performance 0.9 MIPS/MHz
17/07/2015
36
ARM7 Thumb Family
17/07/2015
37
ARM7TDMI processor
 The ARM7TDMI processor is a member of the Advanced
RISC machine family of general purpose 32-bit
microprocessor
 What does mean ARM7TDMI ?
ARM7 - 32-bit Advanced RISC Machine
T - Thumb architecture extension
- Two separate instruction sets, 32-bit ARM instructions and 16bit Thumb instructions
D - Debug extension
M - Enhanced multiplier
I - Embedded ICE macrocell extension
17/07/2015
38
ARM7TDMI Processor Features
 32/16-bit RISC architecture version 4T (ARM v4T)
 3-stage pipeline
 Unified bus architecture
 32-bit ARM Instruction Set + 16-bit Thumb extension
 Forward compatible code
 EmbeddedICE on-chip debug
 Smallest Die Size: 0.53mm² on 0.18µm process
 Industry leading 0.25mW/MHz
17/07/2015
39
ARM7TDMI Block Diagram
 Von Neumann Architecture
 3-stage pipeline
 fetch, decode, execute
 32-bit Data Bus
 32-bit Address Bus
 37 32-bit registers
 32-bit ARM instruction set
 16-bit THUMB instruction set
 32x8 Multiplier
 Barrel Shifter
17/07/2015
40
ARM9
 32-bit RISC processor core with ARM and Thumb instruction
sets
 5-stage instruction pipeline
 Harvard memory layout with two 32-bit linear address spaces,
one for code and one for data
 Integrated instruction and data caches
 Double-bandwidth memory access
 Reduced CPI (Clocks Per Instruction)
 Performance 1.1 MIPS/MHz
17/07/2015
41
ARM926EJ-S Processor Features
 32/16-bit RISC architecture version 5TE
 Harvard 5-stage pipeline
 Separate instruction and data AHB buses
 32-bit ARM Instruction Set plus 16-bit Thumb extension
 DSP instruction extensions
 ARM Jazelle technology Java bytecode acceleration
 MMU to support Symbian OS, Windows CE, Linux and Palm OS
 Selectable size instruction and data caches
 Instruction and data TCM interfaces with wait state support
 EmbeddedICE on-chip debug
 ETM interface for Real-time trace capability with ETM9 macrocell
 Power: 0.5mW/MHz
17/07/2015
42
ARM926EJ-S Block Diagram
ETM Interface
Instruction
TDM Interface
Instruction
Cache
Data
TDM Interface
Data
Cache
ARM9EJ-S
Core
MMU
MMU
Write buffer
Control Logic and Bus Interface Unit
Coprocessor
Interface
AMBA AHB Interface
Instruction
Data
17/07/2015
43
AT91 ARM7-Based Architecture
On-chip
SRAM / ROM
ARM7TDMI
Core
ASB
16-bit
External
Bus
Interface
Advanced
Interrupt
Controller
External
Memories
External
Peripherals
AMBA Bridge
APB
On-chip
Peripherals
17/07/2015
44
AT91R40008 (58A03) Block Diagram
17/07/2015
45
AT91 ARM7-based MCP Architecture
On-chip
SRAM / ROM
ARM7TDMI
Core
ASB
16-bit
External
Bus
Interface
Advanced
Interrupt
Controller
On-chip
16-bit Flash
External
Memories
External
Peripherals
AMBA Bridge
APB
On-chip
Peripherals
Stacked dies
17/07/2015
46
AT91FR40162 Block Diagram
17/07/2015
47
AT91 ARM9-Based Architecture
ARM920T
Core
Advanced
Interrupt
Controller
Memory
Controller
M
M
U
On-chip
SRAM / ROM
ASB
32-bit
External
Bus
Interface
External
Memories
External
Peripherals
AMBA Bridge
APB
On-chip
Peripherals
17/07/2015
48
AT91RM9200 (58A07) Block Diagram
17/07/2015
49
AT91 SAM7S Architecture
On-chip
SRAM
ARM7TDMI
Core
ASB
Advanced
Interrupt
Controller
On-chip
32-bit Flash
AMBA Bridge
APB
On-chip
Peripherals
17/07/2015
50
AT91SAM7S64 (58814) Block Diagram
17/07/2015
51
Advanced
Interrupt
Controller
BIU
M
ARM926EJ-S
M
Core
U
TCM
AT91 SAM9 Architecture
D
I
On-chip
SRAM / ROM
D
I
5-layer
Matrix
32-bit
External
Bus
Interface
AHB
External
Memories
External
Peripherals
AMBA Bridge
APB
On-chip
Peripherals
17/07/2015
52
AT91SAM9261 (59002) Block Diagram
17/07/2015
53
AT91 SAM7SE Architecture
On-chip
SRAM
ARM7TDMI
Core
ASB
On-chip
32-bit Flash
Advanced
Interrupt
Controller
32-bit
External
Bus
Interface
External
Memories
External
Peripherals
AMBA Bridge
APB
On-chip
Peripherals
17/07/2015
54
Appendix
ARM Instruction Set
Summary
17/07/2015
56
Condition Field (1/2)
 All ARM instructions can be conditionally executed, which
means that their execution may or may not take place
depending on the values of values of the N, C, C and V
flags in the CPSR
 Every instruction contains a 4-bit condition code field in
bits 31 to 28
17/07/2015
57
Condition Field (2/2)
 There are fifteen different conditions, each represented by
a two-character suffix that can be appended to the
instruction's mnemonic.
 A Branch (B in assembly) becomes BEQ for "Branch if Equal",
which means the Branch will only be taken if the Z flag is set.
Code
Suffix
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
EQ
NE
CS
CC
MI
PL
VS
VC
HI
LS
GE
LT
GT
LE
AL
Flags
Z set
Z clear
C set
C clear
N set
N clear
V set
V clear
C set and Z clear
C clear or Z set
N equals V
N not equal to V
Z clear AND (N equals V)
Z set OR (N not equal to V)
(ignored)
Meaning
Equal
Not equal
Unsigned higher or same
Unsigned lower
Negative
Positive or zero
Overflow
No overflow
Unsigned higher
Unsigned lower or same
Greater or equal
Less than
Greater than
Less than or equal
always
17/07/2015
58
Branch Instructions (1/2)
 All ARM Processors support a branch instruction that allows a
conditional branch forwards or backwards up to 32Mbytes.

As the Program Counter (PC) is one of the general-purpose registers (register 15), a
branch or jump can also be generated by writing a value to register 15.
 A subroutine call is a variant of the standard branch, the Branch with
Link instruction preserves the address of the instruction after the branch
(the return address) in register 14 (link register or LR).
 A load instruction provides a way to branch anywhere in the 4Gbyte
address space. A 32-bit value is loaded directly from memory into the
PC, causing a branch.
 The ARM processor that support the Thumb instruction set also support
a branch instruction (BX) that jumps to a given address, and optionally
switches executing Thumb instructions.
17/07/2015
59
Branch Instructions (2/2)
 List of branch instructions
B, BL
BX
Branch, and branch with link
Branch and exchange instruction set
 Examples
func
B
BCC
label
label
; branch unconditionally to label
; branch to label if carry flag is clear
BEQ
label
; branch to label if zero flag is set
MOV PC, #0
; R15 = 0, branch to location zero
BL
; subroutine call to function
func
MOV PC, LR
MOV LR, PC
LDR PC, =func
; R15=R14, return to instruction after the BL
; store the address of the instruction after the next one into R14
; load a 32-bit value into the program counter
17/07/2015
60
Data Processing (1/2)
 ARM has 16 data processing instructions. Most data processing
instructions take two source operands (Move and Move Not
have only one operand) and store a result in a register (except
for the Compare and Test instructions which only update the
condition codes)
 Of the two source operands, one is always a register, the other is called a
shifter operand, and is either an immediate value or a register. If the
second operand is a register value, it may have a shift applied to it before
it is used as the operand to the ALU
17/07/2015
61
Data Processing (2/2)
Data Processing (2/2)
 List of data processing instructions
Assembler Mnemonic
OP Code
AND
EOR
WUB
RSB
ADD
ADC
SBC
RSC
TST
TEQ
CMP
CMN
ORR
MOV
BIC
MVN
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
Action
Operand1 AND operand2
Operand1 EOR operand2
Operand1 – operand2
Operand2 operand1
Operand1 + operand2
Operand1 + operand2 + carry
Operand1 – operand2 + carry –1
Operand2 – operand1 + carry –1
As AND, but results is not written
As EOR, but result is not written
As SUB, but result is not written
As ADD, but result is not written
Operand1 OR operand2
Operand2 (operand1 is ignored)
Operand1 AND NOT operand2 (Bit clear)
NOT operand2 (operand1 is ignored)
17/07/2015
62
Multiply Instructions (1/2)
 ARM has two classes of multiply instruction
 normal, 32-bit result
 long, 64-bit result
 All multiply instructions take two register operands as the input
to the multiplier
 ARM does not directly support a multiply by constant instruction due to the
efficiency of shift and add, or shift and reverse subtract instructions
 There are two multiply instructions that produce 32-bit results
 MUL, multiplies the values of two registers together, truncates the result to
32 bits, and stores the result in a third register.
 MLA, multiplies the values of two registers together, adds the value of a
third register, truncates the result to 32 bits, and stores the result into a
fourth register (multiply and accumulate)
MUL
MULS
MLA
R4, R2, R1
R4, R2, R1
R7, R8, R9, R3
; Set R4 to value of R2 multiplied by R1
; R4 = R2xR1, set N and Z flags
; R7 = R8xR9 + R3
17/07/2015
63
Multiply Instructions (2/2)
 There are four multiply instructions that produce 64-bit
results (long multiply)
 Two of the variants multiply the values of two registers together and
store the 64-bit result in a third and fourth register. There are a
signed (SMULL) and unsigned (UMULL) variants.
 The remaining two variants multiply the values of two registers
together, add the 64-bit value from a third and fourth register and
store the 64-bit result back into those registers (third and fourth).
There are also signed (SMLAL) and unsigned (UMLAL) variants.
These instructions perform a long multiply and accumulate
SMULL
UMULL
UMLAL
R4, R8, R2, R3 ; R4 = bits 0 to 31 of R2xR3
; R8 = bits 32 to 63 of R2 x R3
R6, R8, R0, R1 ; R6, R8 = R0 x R1
R5, R8, R0, R1 ; R5, R8 = R0 x R1 + R5, R8
17/07/2015
64
Load and Store Instructions (1/2)
 Load and store instruction come in three types:
 load or store the value of a single register
 load and store multiple register values
 swap a register value with the value of a memory location
 Load and store single register
 Load register instructions can load a 32-bit word, a 16-bit
halfword or an 8-bit byte from memory into a register.
 Store register instructions can store a 32-bit word, a 16-bit
halfword or an 8-bit byte from a register to memory.
 List of load and store single register:
- LDR/STR, Load/Store word
- LDRB/STRB, Load/Store byte
- LDRH/STRH, Load/Store unsigned halfword
- LDRSB, Load signed byte
- LDRSH, Load signed halfword
17/07/2015
65
Load and Store Instructions (2/2)
 Load and Store multiple registers
 Load and Store multiple instructions perform a block transfer of any
number of the general purpose registers to or from memory
 Four addressing modes are provided:
- pre-increment
- post-increment
- pre-decrement
- post-decrement
 List of load and store multiple instructions
- LDM, Load multiple
- STM, Store multiple
 Swap a register value with the value of a memory location
 Swap can load a value from a register-specified memory location, store
the contents of a register to the same memory location, then write the
loaded value to a register.
 List of semaphore instructions
- SWP, Swap
- SWPB, Swap Byte
17/07/2015
66
SWI : Software Interrupt
 The Software Interrupt instruction enters supervisor mode
in a controlled manner:
 The instruction causes the software interrupt trap to be taken,
which effects the mode change
 If the SWI vector address is suitably protected (by external
memory management hardware) from modification by the user, a
fully protected operating system may be constructed.
 The bottom 24 bits of the instruction are ignored by the
processor, and may be used to communicate information
to the supervisor code.
17/07/2015
67
THUMB Instruction Set
Summary
17/07/2015
69
How Does Thumb Work ?
 The Thumb instruction set is a subset of the ARM
instruction set, optimized for code density.
 Almost every Thumb instructions have an ARM
instructions equivalent:
 ADD Rd, #Offset8 <> ADDS Rd, Rd, #Offset8
 Inline expansion of Thumb Instruction to ARM Instruction
 Real time decompression
 Thumb instructions are not actually executed on the core
 The core needs to know whether it is reading Thumb
instructions or ARM instructions.
 Core has two execution states - ARM and Thumb
 Core does not have a mixed 16 and 32 bit instruction set.
17/07/2015
70
Thumb Instruction Set Decompression
THUMB: ADD Rd,#Constant
15
0
001
Always
condition
31
1110
10
Major
opcode
28
Minor
opcode
24
00 1
Rd
Constant
Destination &
source register
21 20 19
0100 1
0
16 15
Rd
0
Zero extended
constant
12
Rd
11
0000
8
7
0
Constant
I op1+op2 S
ARM: ADDS Rd, Rd, #Constant
17/07/2015
71
Branch Instructions
 Thumb supports four types of branch instruction:
 an unconditional branch that allows a forward or backward branch
of up to 2Kbytes
 a conditional branch to allow forward and backward branches of up
to 256 bytes
 a branch with link is supported with a pair of instructions that allow
forward and backwards branches of up to 4Mbytes
 a branch and exchange instruction branches to an address in a
register and optionally switches to ARM code execution
 List of branch instructions
 B
conditional branch
 B
unconditional branch
 BL
Branch with link
 BX
Branch and exchange instruction set
17/07/2015
72
Data Processing Instructions
 Thumb data-processing instructions are a subset of the
ARM data-processing instructions
 All Thumb data-processing instructions set the condition codes
 List of data-processing instructions
 ADC, Add with Carry
 MOV, Move
 ADD, Add
 MUL, Multiply
 AND, Logical AND
 MVN, Move NOT
 ASR, Arithmetic shift right
 NEG, Negate
 BIC, Bit clear
 ORR, Logical OR
 CMN, Compare negative
 ROR, Rotate Right
 CMP, Compare
 SBC, Subtract with Carry
 EOR, Exclusive OR
 SUB, Subtract
 LSL, Logical shift left
 TST, Test
 LSR, Logical shift right
17/07/2015
73
Load and Store Register Instructions
 Thumb supports 8 types of load and store register
instructions
 List of load and store register instructions
 LDR
Load word
 LDRB
Load unsigned byte
 LDRH
Load unsigned halfword
 LDRSB
Load signed byte
 LDRSH
Load signed halfword
 STR
Store word
 STRB
Store byte
 STRH
Store halfword
17/07/2015
74
Load and Store Multiple Instructions
 Thumb supports four types of load and store multiple
instructions
 Two (a load and store) are designed to support block copy
 The other two instructions (called PUSH and POP)
implement a full descending stack, and the stack pointer
is used as the base register
 List of load and store multiple instructions
 LDM
Load multiple
 POP
Pop multiple
 PUSH
Push multiple
 STM
Store multiple
17/07/2015
75
Arm Instruction Set Advantages
 All instructions are 32 bits long.
 Most instructions are executed in one single cycle.
 Every instructions can be conditionally executed.
 A load/store architecture
 Data processing instructions act only on registers
- Three operand format
- Combined ALU and shifter for high speed bit manipulation
 Specific memory access instructions with powerful auto-indexing
addressing modes
 32 bit ,16 bit and 8 bit data types
 Flexible multiple register load and store instructions17/07/2015
76
Thumb Instruction Set Advantages
 All instructions are exactly 16 bits long to improve code
density over other 32-bit architectures
 The Thumb architecture still uses a 32-bit core, with:
 32-bit address space
 32-bit registers
 32-bit shifter and ALU
 32-bit memory transfer
 Gives....
 Long branch range
 Powerful arithmetic operations
 Large address space
17/07/2015
77