Transcript ARM

ARM7 Microprocessor
Contents




System overview
Introduction to ARM
ARM7 Instruction Set Architecture
ARM7 Microarchitecture
System
Code Density
application
Code Exe. Speed
OS &
middleware
SW system
HW system
micro Processor
Memory
system
Size
Power consumption
peripherals
controller
Throughput
Hardware/Software System Architecture
Microprocessor

Factors in deciding processor architecture for a system
 Operating environment
 General purpose system
 Special / limited purpose system (embedded system)

Required performance
 Is high throughput required? (e.g. clock speed, pipeline depth)
 Is optimized functionalities required? (e.g. communication)
 Is power consumption control critical?

Tradeoffs
 High performance
 Many functionalities
= high power
= high power & size
Microprocessor(cont’d)

High performance, general purpose Microprocessor
 Processor Architecture & Performance





General purpose processors
Very high performance (e.g. throughput, clock speed, etc.)
Provide various functionalities (e.g. multimedia instruction set)
High throughput at cost of high power
Software vs. Hardware
 Implementation overhead is in software
 Software optimization is not critical

Examples
 Intel Pentium class
 AMD processors
Microprocessor(cont’d)

Medium Performance, Embedded processors
 Processor Architecture & performance





Embedded processors
Relatively high performance
Provided limited bus specialized functionalities (e.g. low power)
Architecture is decided by its main application environment
Software vs. Hardware
 Implementation overhead is balanced between HW and SW
 Hardware is optimized for a limited range of tasks
 Software optimization in terms of hardware utilization is critical

Examples
 ARMx processor
 MIPSx processor
Microprocessor(cont’d)

Low Performance, Cost-effective Processors
 Processor Architecture & Performance
 Low performance
 Provide basic functionalities
 Used in simple systems where cost is critical

Examples
 8051, 8086, 8088
 Motorola 68k series
Contents




System overview
Introduction to ARM
ARM7 Instruction Set Architecture
ARM7 Microarchitecture
ARM (Advanced Risc Machines)




Strength
 High performance
 Low price
 Very low power consumption
 Good development environment
Weakness
 Lack of DSP operations
Opportunity
 Mobile Computing Trend
 Coming of Post-PC Age
Threat
 Nothing at now
Contents




System overview
Introduction to ARM
ARM7 Instruction Set Architecture
ARM7 Microarchitecture
ARM processor overview


What is ARMx processor?
 Designed by ARM(Advanced RISC Machine)
 Standard 32-bit SoC pocessor(most widely used)
 Balanced performance & size / power
ARM(T) Architecture
 Support THUMB mode (16bit instruction)
 Load-Store Architecture
 Data processing operations only operate on register contents, not
directly on memory contents





Powerful load & store instructions (e.g. indexing)
Conditional execution of all instructions (conditional flag)
Memory Mapped I/O
Four-word depth write buffer
Two-way set-associative, unified 8K-byte cache (instruction cache
and data cache)
load/store architecture



the access to memory is provided through a pair of dedicated
instructions:
 load - copy a value from memory into a register
 store - copy a value from a register into memory
The alternative to load/store is found in CISC processors
 offer a variety of addressing modes. With addressing modes, all
instructions (for example arithmetic instructions) are able to use
operands which are directly in memory. Since all of the operations can
get directly to memory there is no need for special load and store
instructions.
Elliminating the addressing modes is one of the ways that RISC
processors are able to simplify the instruction set.
ARM7 Programmer’s model



Overview
Operational Modes
Exceptions
Overview



From the programmer’s point of view, the ARM can be in one of
two states
 Normal state: execute 32-bit, word-aligned ARM instructions
 THUMB state: operate with 16-bit, half-word-aligned THUMB
instructions
Transition between these two states does not affect the processor
mode or the contents of the registers
THUMB instructions are one-half the bit width of normal ARM
instructions
 Produce very high-density codes
 If the memory bus width is 16-bit or 8-bit, the THUMB instruction will
be has a good performance than normal instruction sets
Overview

Memory formats
 View memory as a linear collection of bytes numbered upwards from
zero
 Bytes 0 to 3 hold the first stored word, bytes 4 to 7 the second and so
on.
 Can treat words in memory as being stored either in Big-Endian or
Little-Endian format
 Big-Endian format : the most significant byte of a word is stored at
the lowest numbered byte and the least significant byte at the
highest numbered byte (byte 0 of the memory system is therefore
connected to data lines 31 through 24)
 Little-Endian format: the lowest numbered byte in a word is
considered the word’s least significant byte, and the highest
numbered byte the most significant. (byte 0 of the memory system is
therefore connected to data lines 7 through 0)
Little- and big-endian memory organizations
If unaligned instruction fetches or data accesses will incur errors
ARM7 Operational Modes

Table of ARM7 operational modes
User
USR
Normal application execution environment*
Fast Interrupt
FIQ
Response-time critical interrupt
Interrupt
IRQ
General purpose interrupt
Supervisor
SVC
Protected mode for operating system
Abort
ABT
Virtual memory protection & management
Undefined
UND
Undefined Instruction (reserved for
coprocessor)
System
SYS
Privileged user mode for operating system
*User mode is subdivided into ARM and THUMB mode
IRQ Mode

When the nIRQ signal asserts, the ARM chip changes to IRQ Mode
FIQ Mode

When the nFIQ pin signal asserts, the ARM enters to the FIQ mode
Supervisor mode

Reset or SWI instruction, the ARM enters to the Supervisor mode
Abort Mode


Access an non-exist instruction or illegal memory address, the
ARM enters to the Abort mode
The programmer can use BKPT instruction to enter Abort mode
System mode and undefined mode


System mode
 It is not entered by any exception
 Intended for use by operating system tasks which need access to
system resources
 Use software to enter this mode
Undefined mode
 ARM CPU tries to decode an illegal instruction then enter to the
Undefined mode
Register File Structure

The ARM processor has a total of 37 registers
 General Purpose Register Files (GPR)
 31 general-purpose registers, including a program counter
 These registers are 32 bits

Program Status Register Files (PSR)
 6 status registers
 These registers are also 32 bits
Register File Structure

Table of ARM7 general purpose register (GPR) file
Purpose
Register
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
Stack Pointer R13
Link Register R14
PC
R15
USR/SYS
ABT
UND
SVC
IRQ
FIQ
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13
R14
R15
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13
R14
R15
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13
R14
R15
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13
R14
R15
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13
R14
R15
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13
R14
R15
ARM7 GPR (cont’d)


Visible register set
 Registers that are visible during specific mode
 16x32bit registers are visible at any mode
 Some registers are shared, some are not
Banked register
 Registers that share the same index
 Only 1 of banked registers are visible at each mode
 R13(SP) and R14(LR) are banked
 FIQ has 5 additional banked registers
 Register dump overhead is reduced at context switch
ARM7 GPR (cont’d)

Banked Register
R13: Stack Pointer
R13_USER
R13_SVC
R13
R13_ABORT
R13_UNDEF
Selector=CPSR
ARM7 GPR (cont’d)
USR/SYS
ABT
UND
SVC
IRQ
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13
R14
(PC)R15
CPSR
SPSR
Totally 37 registers, 18 registers are visible
FIQ
ARM7 GPR (cont’d)



R13, Stack pointer
 Used when stack are implemented
 Used when context switch occurs
 Stores the stack pointer value of tasks
R14, Link Register
 Used when mode change with return occurs
 Stores the return address (current PC)
R15, Program Counter
 Used to store current instruction address
 A write to R15 is equivalent to branch instruction
Instruction Pipeline



Three-stage pipeline is used
 Fetch, Decoder, Execution
The program counter points to
the instruction being fetch rather
than to the instruction being
execution
The Program Counter (PC) value
used in an executing instruction
is always two instructions ahead
of the address
The Relationship between pipeline and PC
Normal ARM Mode
The Relationship between pipeline and PC
THUMB Mode
Pipeline and return address
Program Status Register Files (PSR)



Table of ARM7 program status register file
Register
USR/SYS
ABT
UND
SVC
IRQ
FIQ
CPSR
SPSR
CPSR
CPSR
SPSR
CPSR
SPSR
CPSR
SPSR
CPSR
SPSR
CPSR
SPSR
CPSR
 Stores current processor state
 Contains condition flag and control bits
SPSR
 Stores processor state before entering exception mode
 Structure is identical to CPSR
ARM7 PSR (cont’d)

ARM7 CPSR / SPSR Format
ARM7 PSR (cont’d)



Control Bits I – Interrupt Mask bits (I, F)
 Can be set or reset in privileged mode
 If ‘1’, IRQ or FIQ requests are ignored
Control Bits II – THUMB Bit (T)
 Must not be allocated by software
 Is set or reset by H/W
 If ‘1’, processor is running in THUMB state, else ARM state
Control Bits III – Mode Bits (M4 ~ M0)
 Mode bits reflect current processor mode
 Can be changed in privileged mode (results in mode change)
 Is automatically changed in user mode by H/W
Exceptions



Mode changes can be made under
 Software control
 External interrupts
 Exception process
The modes other than user mode are privileged modes
 Have full access to system resources
 Can change mode freely
Exception modes
 FIQ
 IRQ
 Supervisor mode
 Abort: data abort and instruction prefetch abort
 Undefined
Exception
Task flow
Class
Cause
Interrupt
Fault
Trap
External stimulus
Internal cause
Trap instruction
Exception (cont’d)
ARM7 (ISA v4) Exceptions
Type
Reset
Undefined Instruction
Prefetch Abort
Data Abort
IRQ
FIQ
SW Interrupt
Class
Description (Cause)
FAULT
FAULT
FAULT
INTERRUPT
INTERRUPT
TRAP
Power Up
Invalid / coprocessor instruction
TLB miss for instruction
TLB miss for data access
Normal interrupt
Fast Interrupt (no context switch)
Undefined / coprocessor instruction
Exception (cont’d)
ARM7 (ISA v4) Exception Vectors
Exception
Address
Mode on Entry
Reset
Undefined Instruction
SW Interrupt
Prefetch Abort
Data Abort
IRQ
FIQ
Reserved
0x00000000
0x00000004
0x00000008
0x0000000C
0x00000010
0x00000018
0x0000001C
0x00000014
Supervisor
Undefined
Supervisor
Abort
Abort
IRQ
FIQ
Reserved
ARM Exceptions (cont’d)

On entry (automatically done by ARM)







1) completes the current instruction (except reset exception)
2) Changes to the operating mode corresponding to the 1) particular
exception
3) Saves the address of the following instruction in r14 of new mode
4) Saves the old value of the CPSR in the SPSR of the new mode
5) Disables IRQ exception; set bit 7 of the CPSR
6) If it a FIQ exception, disable further FIQ; disables bit 6 of the CPSR
7) Forces the PC to the address of exception handler
ARM Exceptions (cont’d)

On exit

1) Restores user registers
2) Restores the CPSR using the SPSR
3) set proper return address to PC

!! Conflict in performing step 2) and 3)


 If step 2) is performed prior to step 3), then since lower bits
of the CPSR determines the operating mode, restoring the
CPSR makes it impossible to access the banked r14
 If step 3) is performed prior to step 2), exception handler
loses the control and the code to perform step 2) is never
accessed
ARM7 Instruction Set Overview





A load-store architecture
Auto-increment/decrement addressing
Load/store multiple
64 bit multiplication/MAC operation
Conditional execution (not exact RISC type)
ARM Instruction Set Format
Condition Code
Contents




System overview
Introduction to ARM
ARM7 Instruction Set Architecture
ARM7 Microarchitecture
ARM7 Core
Debugger
DCache
Arbiter
Wrapper
ARM7
Core
ICache
SRAM
ARM7 Datapath

ARM7 Datapath
 Pipeline Model
 Datapath Overview
 Clock Scheme
 IF Stage – Address MUX. & Incremental Block
 ID Stage – Register File
 EXE Stage
 Overview of EXE Stage
 ALU
 Multiplier
Datapath - Microprocessor
A[31:0]
control
address register
P
C
General purpose register
incrementer
PC
register
bank
instruction
decode
A
L
U
b
u
s
multiply
register
A
&
B
b
u
s
barrel
shif t er
control
b
u
s
ALU
Process unit enable signal
dat a out register
dat a in register
D[31:0]
Control logic
ARM7 Pipeline Model
ARM7  standard 3-stage pipelined architecture

FETCH

Fetch Instruction


DECODE

Select/Increment PC
Read next instruction
Decode Instruction




Related Blocks



Address Selector
Address Incrementer
Address Register

EXECUTE
Generate Ctrl. signals
Generate immediate
Read from register file
Related Blocks







Control Logic (Decoder)
Register File
Execute Instruction
Arithmetic / Logic
Calc. branch addr.
Load / Store
Related Blocks



Shifter
Multiplier
ALU
*Register write back (WB) is hidden
ARM7 Pipeline Model(cont’d)

Normal Instruction Flow
1
f et ch
2
dec ode
exec ut e
f et ch
dec ode
exec ut e
f et ch
dec ode
3
ins truc tion
exec ut e
time

Stalls Needed for Longer Instructions
1
f et chADD
2
3
4
5
dec ode
exec ut e
f et ch STR dec ode
c alc. addr. dat a xf er
f et chADD
dec ode
f et chADD
exec ut e
dec ode
exec ut e
f et chADD
dec ode
ins truc tion
time
exec ut e
Data Hazard on r1:
Time (clock cycles)
Dm
Reg
Dm
Reg
Dm
Reg
Dm
Reg
ALU
xor r10,r1,r11
Im
WB
ALU
or r8,r1,r9
Reg
MEM
ALU
and r6,r1,r7
Im
EX
ALU
O
r
d
e
r
sub r4,r1,r3
ID/RF
ALU
I
n
s
t
r.
add r1,r2,r3
IF
Im
Im
Reg
Im
Reg
Reg
Reg
Dm
Reg
ARM7 Pipeline Model(cont’d)

CISC Behavior of ARM7
 Many ARM instructions are complex
 Instruction consists of 1 or more microcodes
 Execution time is not equally distributed among instructions

ARM7 Pipeline is unbalanced
 Execution state of ARM7 is bottleneck
 Shift, ALU, ICACHE access are done in a single stage
 ARM9 expanded EXE to EXE-MEM (thus IF-ID-EXE-MEM-WB)

Instructions that take more than 1 exe cycle
 All multiply instructions (due to complexity)
 All instructions that read 3 register values
 All LOAD/STORE instructions
ARM7 Datapath Overview
A[31:0]
control
address register
FETCH
P
C
incrementer
PC
register
bank
DECODE
instruction
decode
A
L
U
EXECUTE
b
u
s
multiply
register
A
&
B
b
u
s
barrel
shif t er
control
b
u
s
ALU
(WB)
dat a out register
dat a in register
D[31:0]
*Pipeline registers are omitted
ARM7 Clock Scheme

ARM7 clock phase
phas e 1
phas e 2
1 c lock c y cle


ARM7 generates 2 non-overlapping internal clock
Some data blocks operate during phase 1 or 2 only
 E.g. Shifter (phase1), ALU (phase 2)
ARM7 IF Stage

Instruction Fetch Stage Diagram (example)
A[31:0]
control
To ICache Exception
address register
P
C
ALU
Next instruction
address
incrementer
PC
register
bank
instruction
Address Mux. + Reg.
decode
A
L
U
b
u
s
multiply
register
A
&
B
b
u
s
barrel
shif t er
control
b
u
s
+2/4 Increment
ALU
Incrementer bus
dat a out register
dat a in register
D[31:0]
*PC stores at R15 should always be +8 of EXE address
ARM7 ID Stage

Instruction Decode Stage Diagram (example)
A[31:0]
control
PSR
read
address register
P
C
PSR GPR GPR GPR
write read1 read2 write
OP Code
incrementer
PC
register
bank
instruction
PSR
GPR
decode
A
L
U
b
u
s
multiply
register
&
A
B
b
u
s
b
u
s
barrel
shif t er
control
PSR
out
read1
data
read2
Data bus B
data
Data bus A
ALU
Read port : sampled at start of phase 1
dat a out register
dat a in register
D[31:0]
write port : sampled at start of phase 2
*PC port is omitted for simplicity
ARM7 Execution Stage

Execute Stage Diagram (example)
A[31:0]
control
address register
P
C
incrementer
PC
register
bank
instruction
decode
A
L
U
b
u
s
multiply
register
A
&
B
b
u
s
barrel
shif t er
control
b
u
s
ALU
dat a out register
dat a in register
D[31:0]