Transcript SHARC programming model
Embedded System HW
Why use microprocessors?
Alternatives: field-programmable gate arrays (FPGAs), custom logic, etc.
Microprocessors are often very efficient: can use same logic to perform many different functions.
Microprocessors simplify the design of families of products.
The performance paradox
Microprocessors use much more logic to implement a function than does custom logic.
But microprocessors are often at least as fast: heavily pipelined; large design teams; aggressive VLSI technology.
Power
Custom logic is a clear winner for low power devices.
Modern microprocessors offer features to help control power consumption.
Software design techniques can help reduce power consumption.
Microprocessor varieties
Microcontroller: includes I/O devices, on board memory.
Digital signal processor (DSP): microprocessor optimized for digital signal processing.
Typical embedded word sizes: 8-bit, 16-bit, 32-bit.
Many Types of Programmable Processors
Past Microprocessor Microcontroller DSP Graphics Processor Now / Future Network Processor Sensor Processor Cryptoprocessor Game Processor Wearable Processor Mobile Processor
Application-Specific Instruction Processors (ASIPs)
Processors with instruction-sets tailored to specific applications or application domains instruction-set generation as part of synthesis Pluses: customization yields lower area, power etc. Minuses: higher h/w & s/w development overhead – – design, compilers, debuggers higher time to market
Reconfigurable SoC
Other Examples
Atmel’s FPSLIC (AVR + FPGA) Altera’s Nios (configurable RISC on a PLD) Triscend’s A7 CSoC
Instruction Sets
von Neumann architecture
Memory holds data, instructions.
Central processing unit (CPU) fetches instructions from memory.
Separate CPU and memory distinguishes programmable computer.
CPU registers help out: program counter (PC), instruction register (IR), general purpose registers, etc.
CPU + memory
200 memory ADD r5,r1,r3 address data CPU
Harvard architecture
data memory program memory address data address data PC CPU
von Neumann vs. Harvard
Harvard can ’ t use self-modifying code.
Harvard allows two simultaneous memory fetches.
Most DSPs use Harvard architecture for streaming data: greater memory bandwidth; more predictable bandwidth.
RISC vs. CISC
Complex instruction set computer ( CISC ): many addressing modes; many operations.
Reduced instruction set computer ( RISC ): load/store; pipelinable instructions.
Instruction set characteristics
Fixed vs. variable length.
Addressing modes.
Number of operands.
Types of operands.
Programming model
Programming model : registers visible to the programmer.
Some registers are not visible (IR).
Multiple implementations
Successful architectures have several implementations: varying clock speeds; different bus widths; different cache sizes; etc.
ARM Architecture
A dvanced R ISC M achines(1990) (ACORN and Apple Computer)
ARM Architecture
ARM versions.
ARM assembly language.
ARM programming model.
ARM versions
ARM architecture has been extended over several versions.
We will concentrate on ARMv5
Evolution of the ARM architecture versions
ARMv6 Improvement
Memory management Multiprocessing Multimedia support: SIMD capability
Evolution of the ARM architecture ARM11
Introduction
To allow very small, yet high-performance implementations RISC Large uniform register file Load/store architecture Simple addressing modes Uniform and fixed-length instr fields Auto-increment and auto-decrement addr modes Conditional execution of all instrcutions
ARM assembly language
Fairly standard assembly language: label LDR r0,[r8] ; a comment ADD r4,r0,r1
Programming Model
ARM data types
Byte : Halfword : 16 bits Must be aligned to two-byte boundaries Word : 32 bits Must be aligned to four-byte boundaries ARM addresses can be 32 bits long.
Address refers to byte.
Address 4 starts at byte 4.
Can be configured at power-up as either little- or bit-endian mode.
Processor modes
User: usr – Normal program execution modes FIQ : fiq – Supports a high-speed data transfer or channel process IRQ : irq – Used for general-purpose interrupt handling Supervisor : svc – A protected mode for OS Abort : abt – Implements VM and/or memory protection Undefined : und – Supports software emulation of HW coprocessors System: sys – Runs privileged OS tasks fiq, irq, svc, abt, und –
exception modes
Registers
r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 (PC) unbanked registers banked registers 31 CPSR N Z C V
Link register
0
Endianness
Relationship between bit and byte/word ordering defines endianness: bit 31 bit 0 byte 3 byte 2 byte 1 byte 0 little-endian bit 0 bit 31 byte 0 byte 1 byte 2 byte 3 big-endian
ARM status bits
Every arithmetic, logical, or shifting operation may set CPSR (
current program statues register
) bits: N (negative), Z (zero), C (carry), V (overflow).
Examples: -1 + 1 = 0: NZCV = 0110.
2 31 -1+1 = -2 31 : NZCV = 0101.
ARM data processing – operand addressing
Instruction syntax
Condition field
Almost all ARM instrs. – conditionally executed
ARM data processing – operand addressing
Data processing immediate shift 31 28 25 21 19 16 cond 000 opcode S Rn Rd 12 7 5 4 3 shift amount shift 0 Rm 0 Data processing register shift 31 28 25 21 19 cond 000 opcode S Rn 16 Rd 12 Rs 7 5 4 3 0 shift 1 Rm 0 Data processing 32-bit immediate 31 28 25 21 19 16 cond 001 opcode S Rn Rd 12 rotate 7 5 4 3 immediate-8 0
Shifter operand
Immediate 8-bit constant and a 4 bit rotate (0,2,4,8,…,30) mov r0, #0 add r9, r9,#1 Register operand mov r2, r0 Shifted register operand ASR, LSL, LSR, ROR, RRX (by one bit) mov r2, r0, LSL #2 ; shift r0 left by 2, write to r2 (r2=r0x4) sub r10,r9,r8, LSR #4 ; r10 = r9 - r8/16 sov r10,r9,r8, ROR r3 ; r10 = r9 - (r8 rotated by value of r3)
ARM data-processing
AND EOR SUB :
Rd:= Rn - shifter operand
RSB :
Rd:= shifter operand - Rn
ADD ADC (with carry) SBC RSC (reverse SBC) TST :
update flags after Rn AND shifter operand
TEQ CMP CMN:
copmare negated
ORR (logical OR) MOV BIC MVN (mov not)
ARM data-processing
Shift, Rotate ? – shifter-operand LSL, LSR : logical shift left/right ASR : arithmetic shift left/right ROR : rotate right RRX : rotate right extended with C
Data operation varieties
Logical shift: fills with zeroes.
Arithmetic shift: fills with sign extension RRX performs 33-bit rotate, including C bit from CPSR above sign bit.
Load and Store instructions
Two types 32-bit word or an 8-bit unsigned byte Load and store halfword and load signed byte Addressing modes Base register Any one of GPR (including the PC) Offset Three format
Addressing modes
Offset Immediate: unsigned number (12 bits or 8 bits) Register: GPR (not the PC) Scaled register: shifted by an immediate value LSL, LSR, ASR, ROR, RRX Three ways to form the memory address EA := Base register + or – Offset Offset Pre-indexed Post-indexed
Addressing modes
Base-plus-offset addressing: LDR r0,[r1,#16] Loads from location r1+16 Pre-indexing increments base register: LDR r0,[r1,#16]!
Post-indexing fetches, then does offset: LDR r0,[r1],#16 Loads r0 from r1, then adds 16 to r1.
Load and store
LDR LDRB LDRH LDRSB (signed byte) LDRSH (signed halfw) STR STRB STRH
LDR LDR LDR STRB LDR LDR LDR LDR Examples R1, [R0] ; load R1 from the address in R0 R8, [R3, #4] R8, [R3, #-4] R10, [R7, -R4] R11, [R3, R5, LSL #2] ; EA = [R3] + 4 ; EA = [R3] – 4 ; EA = [R7] – [R4] ; EA = [R3] + ([R5]x4) R3, [R9], #4 R1, [R0, #2] !
R0, [PC, #40] ; EA = [R9], R9 = [R9] +4 post-indexed ; EA = [R0]+2, R0=[R0]+2 pre-indexed ; load R0 from PC+0x40 (= address of the ; instruction +8 + 0x40)
Load and store multiple
Addressing modes IA : increment after IB : increment before DA: decrement after DB: decrement before
Load and store multiple
LDM STM Examples
LDMIA r0, {r5 – r8} ; load multiple r5-r8 from ; the address in r0 STMDA r1!, {r2, r5, r7 – r9, r11} ; update r1
Branch instructions
Conditional branch forwards or backwards up to 32 MB Sign-extending the 24-bit imm_data to 32 bits Shifting the result left two bits Adding this to the PC (the addr of branch +8) Approximately ± 32MB B, BL
Examples B label BCC label ; branch if carry flag is clear BEQ label ; if zero flag is set MOV PC, #0 ; branch to location zero BL func ; subroutine call MOV PC,LR ; return MOV LR, PC LDR PC, =func ;
ARM ADR pseudo-op
Cannot refer to an address directly in an instruction.
Generate value by performing arithmetic on PC.
ADR pseudo-op generates instruction required to calculate address: ADR r1,FOO
Examples start MOV ADR r0, #10 r4, start; => SUB r4,pc,#0xc start = pc - 4 - 8 = pc - 12 = pc - 0xc
Example: C assignments
C: x = (a + b) - c; Assembler: ADR r4,a LDR r0,[r4] ADR r4,b LDR r1,[r4] ADD r3,r0,r1 ADR r4,c LDR r2[r4] ; get address for a ; get value of a ; get address for b, reusing r4 ; get value of b ; compute a+b ; get address for c ; get value of c
C assignment, cont ’ d.
SUB r3,r3,r2 ADR r4,x STR r3[r4] ; complete computation of x ; get address for x ; store value of x
Example: C assignment
C: y = a*(b+c); Assembler: ADR r4,b ; get address for b LDR r0,[r4] ; get value of b ADR r4,c ; get address for c LDR r1,[r4] ; get value of c ADD r2,r0,r1 ; compute partial result ADR r4,a ; get address for a LDR r0,[r4] ; get value of a
C assignment, cont ’ d.
MUL r2,r2,r0 ; compute final value for y ADR r4,y ; get address for y STR r2,[r4] ; store y
Example: C assignment
C: z = (a << 2) | (b & 15); Assembler: ADR r4,a ; get address for a LDR r0,[r4] ; get value of a MOV r0,r0,LSL 2 ; perform shift ADR r4,b ; get address for b LDR r1,[r4] ; get value of b AND r1,r1,#15 ; perform AND ORR r1,r0,r1 ; perform OR
C assignment, cont ’ d.
ADR r4,z ; get address for z STR r1,[r4] ; store value for z
Example: if statement
C: if (a < b) { x = 5; y = c + d; } else x = c - d; Assembler: ; compute and test condition ADR r4,a ; get address for a LDR r0,[r4] ; get value of a ADR r4,b ; get address for b LDR r1,[r4] ; get value for b CMP r0,r1 ; compare a < b BGE fblock ; if a >= b, branch to false block
If statement, cont ’ d.
; true block MOV r0,#5 ; generate value for x ADR r4,x ; get address for x STR r0,[r4] ; store x ADR r4,c ; get address for c LDR r0,[r4] ; get value of c ADR r4,d ; get address for d LDR r1,[r4] ; get value of d ADD r0,r0,r1 ; compute y ADR r4,y ; get address for y STR r0,[r4] ; store y B after ; branch around false block
If statement, cont ’ d.
; false block fblock ADR r4,c ; get address for c LDR r0,[r4] ; get value of c ADR r4,d ; get address for d LDR r1,[r4] ; get value for d SUB r0,r0,r1 ; compute a-b ADR r4,x ; get address for x STR r0,[r4] ; store value of x after ...
Example: Conditional instruction implementation
; true block MOVLT r0,#5 ; generate value for x ADRLT r4,x ; get address for x STRLT r0,[r4] ; store x ADRLT r4,c ; get address for c LDRLT r0,[r4] ; get value of c ADRLT r4,d ; get address for d LDRLT r1,[r4] ; get value of d ADDLT r0,r0,r1 ; compute y ADRLT r4,y ; get address for y STRLT r0,[r4] ; store y
Conditional instruction implementation, cont ’ d.
; false block ADRGE r4,c ; get address for c LDRGE r0,[r4] ; get value of c ADRGE r4,d ; get address for d LDRGE r1,[r4] ; get value for d SUBGE r0,r0,r1 ; compute a-b ADRGE r4,x ; get address for x STRGE r0,[r4] ; store value of x
Example: FIR filter
C: for (i=0, f=0; i FIR filter, cont ’ .d ADR r3,c ; load r3 with base of c ADR r5,x ; load r5 with base of x ; loop body loop LDR r4,[r3,r8] ; get c[i] LDR r6,[r5,r8] ; get x[i] MUL r4,r4,r6 ; compute c[i]*x[i] ADD r2,r2,r4 ; add into running sum ADD r8,r8,#4 ; add one word offset to array index ADD r0,r0,#1 ; add 1 to i CMP r0,r1 ; exit? BLT loop ; if i < N, continue Nested subroutine calls Nesting/recursion requires coding f1 convention: LDR r0,[r13] ; load arg into r0 from stack ; call f2() STR r14,[r13]! ; store f1’s return adrs STR r0,[r13]! ; store arg to f2 on stack BL f2 ; branch and link to f2 ; return from f1() SUB r13,#4 ; pop f2’s arg off stack LDR r15,[r13]! ; restore register and return Summary Load/store architecture Most instructions are RISCy, operate in single cycle. Some multi-register operations take longer. All instructions can be executed conditionally. MPC850 Integrated Communication Microprocessor Reference Manuals MPC850 Family User Manual PowerPC Programming Environment Manual Course Home Page http://calab.kaist.ac.kr/~maeng/cs310/micro02.h tm Motorola Home Page http://e-www.motorola.com Overview Versatile, one-chip, integrated communication processor Embedded PowerPC core Versatile memory controller Communication processor module (CPM) Serial communication controllers (SCCs) One USB Etc. Embedded PowerPC core Single issue, 32-bit version Branch folding and prediction 2-K byte I-cache, 1K byte D-cache 2-way set-associative Physical MMUs with 8-entry TLBs 4K, 16K, 256K, 512K, and 8MB page sizes Other Features Dynamic data bus sizing : 8-, 16-, 32-bit CPU clock : 0-80MHz System Integration Unit (SIU) Memory Controller General Purpose timer CPM, SCCs, SMCs, etc. PowerPC Architecture PowerPC instruction set Overview Operand Conventions PowerPC Registers and programming model Addressing Modes Instruction Set Cache model Exception Model Memory management model PowerPC Architecture Motorola, IBM, Apple computer Power Architecture: RS/6000 family 64-bit architecture with a 32-bit subset Three Levels of the architecture Flexibility – degrees of SW compatibility UISA (User instruction set architecture) VEA (Virtual environment architecture) OEA (Operating environment architecture) Features not defined by the PowerPC Architecture For flexibility System bus interface signals Cache design The number and the nature of execution units Other internal micro-architecture issues Endianness Relationship between bit and byte/word ordering defines endianness: bit 31 bit 0 byte 3 byte 2 byte 1 byte 0 little-endian bit 0 bit 31 byte 0 byte 1 byte 2 byte 3 big-endian ARM, Intel PowerPC, IBM, Motorola Programming Model – Registers PowerPC programming model - Register Set User Model – UISA (32-bit architecture) GPR0(32) GPR1(32) FGPR0(64) FGPR1(64) Condition register CR(32) FP status and control register GPR31(32) FGPR31(64) FPSCR(32) XER register XER(32) Link register LR(64/32) Count register CTR(64/32) Condition Registers (CR) For testing and branching 0 CR0 CR1 CR2 FP CR3 CR4 CR5 CR6 CR7 31 For all integer instrs. Bit0: Negative(LT) Bit1: Positive(GT) Bit2: Zero (EQ) Bit3: Summary Overflow(SO) Condition register CRn Field – Compare Instruction XER Register (XER) XER Register (XER), cont’d Link Register (LR), Count Register (CTR) bclrx (bc to link register) Branch with link update Counter Register Loop count VEA Register Set – Time Base OEA Register Set Machine State Register (MSR) Addressing Modes Effective Address Calculation Register indirect with immediate index mode Register indirect with index mode Register indirect mode Register Indirect with Immediate Index Addressing Register Indirect with Index Register Indirect Instruction Formats 4 bytes long and word-aligned Bits 0-5 always specify the primary opcode Extended opcode Instruction set Integer Floating-point Load and store Flow control Processor control Memory synchronization Memory control External control Integer Instructions Arithmetic, compare, logical, rotate and shift Integer arithmetic, shift, rotate, and string move May update or read values from the XER The CR may be updated if the R c bit is set. addic - addic. Integer Compare Algebraically, logically crf D can be omitted if the result is to be placed in CR0 crf D field : the target CR The L bit has no effect on 32-bit operations Integer compare, cont’d Integer Logical Integer Logical, cont’d Rotate and Shift Instructions SH: specify the number of bits to rotate MB: mask start ME: mask stop Integer Rotate Integer Shift Load and Store Integer load and store Integer load and store with byte-reverse Integer load and store multiple FP load and store Memory synchronization Branch and Flow Control EA calculation Branch relative Branch conditional to relative address Branch to absolute address Branch conditional to absolute address Branch conditional to link register Branch conditional to count register Branch Relative Branch conditional to relative Branch to Absolute Branch conditional to absolute Branch conditional to LR Branch conditional to count register Conditional Branch control Branch Instructions CR logical Instructions Trap, System Linkage Processor Control Memory Synchronization Example Test and Set loop: lwarx r5,0,r3 # load and reserve cmpwi r5,0 # done if word bne $+12 # not equal to 0 stwcx. r4,0,r3 # try to store non-zero bne loop # loop if lost reservation Summary UISA, VEA, OEA Register set Fixed size instruction - RISC Load and store architecture 3 addressing modes Condition Register Update – Rc field 8 condition registers Branch addressing modes BO, BI fields Relative, absolute, LR, CTR