SHARC programming model

Download Report

Transcript SHARC programming model

Embedded System HW

Why use microprocessors?

 Alternatives: field-programmable gate arrays (FPGAs), custom logic, etc.

 Microprocessors are often very efficient: can use same logic to perform many different functions.

 Microprocessors simplify the design of families of products.

The performance paradox

 Microprocessors use much more logic to implement a function than does custom logic.

 But microprocessors are often at least as fast:  heavily pipelined;  large design teams;  aggressive VLSI technology.

Power

 Custom logic is a clear winner for low power devices.

 Modern microprocessors offer features to help control power consumption.

 Software design techniques can help reduce power consumption.

Microprocessor varieties

 Microcontroller: includes I/O devices, on board memory.

 Digital signal processor (DSP): microprocessor optimized for digital signal processing.

 Typical embedded word sizes: 8-bit, 16-bit, 32-bit.

Many Types of Programmable Processors

 Past  Microprocessor  Microcontroller  DSP  Graphics Processor  Now / Future  Network Processor  Sensor Processor  Cryptoprocessor  Game Processor  Wearable Processor  Mobile Processor

Application-Specific Instruction Processors (ASIPs)

   Processors with instruction-sets tailored to specific applications or application domains  instruction-set generation as part of synthesis Pluses:  customization yields lower area, power etc. Minuses:  higher h/w & s/w development overhead – – design, compilers, debuggers higher time to market

Reconfigurable SoC

Other Examples

Atmel’s FPSLIC (AVR + FPGA) Altera’s Nios (configurable RISC on a PLD) Triscend’s A7 CSoC

Instruction Sets

von Neumann architecture

 Memory holds data, instructions.

 Central processing unit (CPU) fetches instructions from memory.

 Separate CPU and memory distinguishes programmable computer.

 CPU registers help out: program counter (PC), instruction register (IR), general purpose registers, etc.

CPU + memory

200 memory ADD r5,r1,r3 address data CPU

Harvard architecture

data memory program memory address data address data PC CPU

von Neumann vs. Harvard

 Harvard can ’ t use self-modifying code.

 Harvard allows two simultaneous memory fetches.

 Most DSPs use Harvard architecture for streaming data:  greater memory bandwidth;  more predictable bandwidth.

RISC vs. CISC

 Complex instruction set computer ( CISC ):  many addressing modes;  many operations.

 Reduced instruction set computer ( RISC ):  load/store;  pipelinable instructions.

Instruction set characteristics

 Fixed vs. variable length.

 Addressing modes.

 Number of operands.

 Types of operands.

Programming model

 Programming model : registers visible to the programmer.

 Some registers are not visible (IR).

Multiple implementations

 Successful architectures have several implementations:  varying clock speeds;  different bus widths;  different cache sizes;  etc.

ARM Architecture

A dvanced R ISC M achines(1990) (ACORN and Apple Computer)

ARM Architecture

 ARM versions.

 ARM assembly language.

 ARM programming model.

ARM versions

 ARM architecture has been extended over several versions.

 We will concentrate on ARMv5

Evolution of the ARM architecture versions

ARMv6 Improvement

 Memory management  Multiprocessing  Multimedia support: SIMD capability

Evolution of the ARM architecture ARM11

Introduction

  To allow very small, yet high-performance implementations RISC  Large uniform register file  Load/store architecture  Simple addressing modes   Uniform and fixed-length instr fields Auto-increment and auto-decrement addr modes  Conditional execution of all instrcutions

ARM assembly language

 Fairly standard assembly language: label LDR r0,[r8] ; a comment ADD r4,r0,r1

Programming Model

ARM data types

      Byte : Halfword : 16 bits  Must be aligned to two-byte boundaries Word : 32 bits  Must be aligned to four-byte boundaries ARM addresses can be 32 bits long.

Address refers to byte.

 Address 4 starts at byte 4.

Can be configured at power-up as either little- or bit-endian mode.

Processor modes

       User: usr – Normal program execution modes FIQ : fiq – Supports a high-speed data transfer or channel process IRQ : irq – Used for general-purpose interrupt handling Supervisor : svc – A protected mode for OS Abort : abt – Implements VM and/or memory protection Undefined : und – Supports software emulation of HW coprocessors System: sys – Runs privileged OS tasks  fiq, irq, svc, abt, und –

exception modes

Registers

r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 (PC) unbanked registers banked registers 31 CPSR N Z C V

Link register

0

Endianness

 Relationship between bit and byte/word ordering defines endianness: bit 31 bit 0 byte 3 byte 2 byte 1 byte 0 little-endian bit 0 bit 31 byte 0 byte 1 byte 2 byte 3 big-endian

ARM status bits

 Every arithmetic, logical, or shifting operation may set CPSR (

current program statues register

) bits:  N (negative), Z (zero), C (carry), V (overflow).

 Examples:  -1 + 1 = 0: NZCV = 0110.

 2 31 -1+1 = -2 31 : NZCV = 0101.

ARM data processing – operand addressing

 Instruction syntax  {}{S} , , has 11 options

Condition field

Almost all ARM instrs. – conditionally executed

ARM data processing – operand addressing

Data processing immediate shift 31 28 25 21 19 16 cond 000 opcode S Rn Rd 12 7 5 4 3 shift amount shift 0 Rm 0 Data processing register shift 31 28 25 21 19 cond 000 opcode S Rn 16 Rd 12 Rs 7 5 4 3 0 shift 1 Rm 0 Data processing 32-bit immediate 31 28 25 21 19 16 cond 001 opcode S Rn Rd 12 rotate 7 5 4 3 immediate-8 0

Shifter operand

   Immediate  8-bit constant and a 4 bit rotate (0,2,4,8,…,30)   mov r0, #0 add r9, r9,#1 Register operand  mov r2, r0 Shifted register operand  ASR, LSL, LSR, ROR, RRX (by one bit)    mov r2, r0, LSL #2 ; shift r0 left by 2, write to r2 (r2=r0x4) sub r10,r9,r8, LSR #4 ; r10 = r9 - r8/16 sov r10,r9,r8, ROR r3 ; r10 = r9 - (r8 rotated by value of r3)

ARM data-processing

        AND EOR SUB :

Rd:= Rn - shifter operand

RSB :

Rd:= shifter operand - Rn

ADD ADC (with carry) SBC RSC (reverse SBC)         TST :

update flags after Rn AND shifter operand

TEQ CMP CMN:

copmare negated

ORR (logical OR) MOV BIC MVN (mov not)

ARM data-processing

 Shift, Rotate ? – shifter-operand  LSL, LSR : logical shift left/right  ASR : arithmetic shift left/right  ROR : rotate right  RRX : rotate right extended with C

Data operation varieties

 Logical shift:  fills with zeroes.

 Arithmetic shift:  fills with sign extension  RRX performs 33-bit rotate, including C bit from CPSR above sign bit.

Load and Store instructions

 Two types  32-bit word or an 8-bit unsigned byte  Load and store halfword and load signed byte  Addressing modes  Base register  Any one of GPR (including the PC)  Offset  Three format

Addressing modes

  Offset   Immediate: unsigned number (12 bits or 8 bits) Register: GPR (not the PC)  Scaled register: shifted by an immediate value  LSL, LSR, ASR, ROR, RRX Three ways to form the memory address  EA := Base register + or – Offset  Offset   Pre-indexed Post-indexed

Addressing modes

 Base-plus-offset addressing: LDR r0,[r1,#16]  Loads from location r1+16  Pre-indexing increments base register: LDR r0,[r1,#16]!

 Post-indexing fetches, then does offset: LDR r0,[r1],#16  Loads r0 from r1, then adds 16 to r1.

Load and store

   LDR LDRB LDRH   LDRSB (signed byte) LDRSH (signed halfw)    STR STRB STRH

LDR LDR LDR STRB LDR LDR LDR LDR Examples R1, [R0] ; load R1 from the address in R0 R8, [R3, #4] R8, [R3, #-4] R10, [R7, -R4] R11, [R3, R5, LSL #2] ; EA = [R3] + 4 ; EA = [R3] – 4 ; EA = [R7] – [R4] ; EA = [R3] + ([R5]x4) R3, [R9], #4 R1, [R0, #2] !

R0, [PC, #40] ; EA = [R9], R9 = [R9] +4 post-indexed ; EA = [R0]+2, R0=[R0]+2 pre-indexed ; load R0 from PC+0x40 (= address of the ; instruction +8 + 0x40)

Load and store multiple

 Addressing modes  IA : increment after  IB : increment before  DA: decrement after  DB: decrement before

Load and store multiple

 LDM  STM  Examples  

LDMIA r0, {r5 – r8} ; load multiple r5-r8 from ; the address in r0 STMDA r1!, {r2, r5, r7 – r9, r11} ; update r1

Branch instructions

 Conditional branch forwards or backwards up to 32 MB  Sign-extending the 24-bit imm_data to 32 bits  Shifting the result left two bits   Adding this to the PC (the addr of branch +8) Approximately ± 32MB  B, BL

Examples B label BCC label ; branch if carry flag is clear BEQ label ; if zero flag is set MOV PC, #0 ; branch to location zero BL func ; subroutine call MOV PC,LR ; return MOV LR, PC LDR PC, =func ;

ARM ADR pseudo-op

 Cannot refer to an address directly in an instruction.

 Generate value by performing arithmetic on PC.

 ADR pseudo-op generates instruction required to calculate address: ADR r1,FOO

Examples start MOV ADR r0, #10 r4, start; => SUB r4,pc,#0xc start = pc - 4 - 8 = pc - 12 = pc - 0xc

Example: C assignments

 C: x = (a + b) - c;  Assembler: ADR r4,a LDR r0,[r4] ADR r4,b LDR r1,[r4] ADD r3,r0,r1 ADR r4,c LDR r2[r4] ; get address for a ; get value of a ; get address for b, reusing r4 ; get value of b ; compute a+b ; get address for c ; get value of c

C assignment, cont ’ d.

SUB r3,r3,r2 ADR r4,x STR r3[r4] ; complete computation of x ; get address for x ; store value of x

Example: C assignment

 C: y = a*(b+c);  Assembler: ADR r4,b ; get address for b LDR r0,[r4] ; get value of b ADR r4,c ; get address for c LDR r1,[r4] ; get value of c ADD r2,r0,r1 ; compute partial result ADR r4,a ; get address for a LDR r0,[r4] ; get value of a

C assignment, cont ’ d.

MUL r2,r2,r0 ; compute final value for y ADR r4,y ; get address for y STR r2,[r4] ; store y

Example: C assignment

 C: z = (a << 2) | (b & 15);  Assembler: ADR r4,a ; get address for a LDR r0,[r4] ; get value of a MOV r0,r0,LSL 2 ; perform shift ADR r4,b ; get address for b LDR r1,[r4] ; get value of b AND r1,r1,#15 ; perform AND ORR r1,r0,r1 ; perform OR

C assignment, cont ’ d.

ADR r4,z ; get address for z STR r1,[r4] ; store value for z

Example: if statement

 C: if (a < b) { x = 5; y = c + d; } else x = c - d;  Assembler: ; compute and test condition ADR r4,a ; get address for a LDR r0,[r4] ; get value of a ADR r4,b ; get address for b LDR r1,[r4] ; get value for b CMP r0,r1 ; compare a < b BGE fblock ; if a >= b, branch to false block

If statement, cont ’ d.

; true block MOV r0,#5 ; generate value for x ADR r4,x ; get address for x STR r0,[r4] ; store x ADR r4,c ; get address for c LDR r0,[r4] ; get value of c ADR r4,d ; get address for d LDR r1,[r4] ; get value of d ADD r0,r0,r1 ; compute y ADR r4,y ; get address for y STR r0,[r4] ; store y B after ; branch around false block

If statement, cont ’ d.

; false block fblock ADR r4,c ; get address for c LDR r0,[r4] ; get value of c ADR r4,d ; get address for d LDR r1,[r4] ; get value for d SUB r0,r0,r1 ; compute a-b ADR r4,x ; get address for x STR r0,[r4] ; store value of x after ...

Example: Conditional instruction implementation

; true block MOVLT r0,#5 ; generate value for x ADRLT r4,x ; get address for x STRLT r0,[r4] ; store x ADRLT r4,c ; get address for c LDRLT r0,[r4] ; get value of c ADRLT r4,d ; get address for d LDRLT r1,[r4] ; get value of d ADDLT r0,r0,r1 ; compute y ADRLT r4,y ; get address for y STRLT r0,[r4] ; store y

Conditional instruction implementation, cont ’ d.

; false block ADRGE r4,c ; get address for c LDRGE r0,[r4] ; get value of c ADRGE r4,d ; get address for d LDRGE r1,[r4] ; get value for d SUBGE r0,r0,r1 ; compute a-b ADRGE r4,x ; get address for x STRGE r0,[r4] ; store value of x

Example: FIR filter

 C: for (i=0, f=0; i

FIR filter, cont ’ .d

ADR r3,c ; load r3 with base of c ADR r5,x ; load r5 with base of x ; loop body loop LDR r4,[r3,r8] ; get c[i] LDR r6,[r5,r8] ; get x[i] MUL r4,r4,r6 ; compute c[i]*x[i] ADD r2,r2,r4 ; add into running sum ADD r8,r8,#4 ; add one word offset to array index ADD r0,r0,#1 ; add 1 to i CMP r0,r1 ; exit?

BLT loop ; if i < N, continue

Nested subroutine calls

 Nesting/recursion requires coding f1 convention: LDR r0,[r13] ; load arg into r0 from stack ; call f2() STR r14,[r13]! ; store f1’s return adrs STR r0,[r13]! ; store arg to f2 on stack BL f2 ; branch and link to f2 ; return from f1() SUB r13,#4 ; pop f2’s arg off stack LDR r15,[r13]! ; restore register and return

Summary

 Load/store architecture  Most instructions are RISCy, operate in single cycle.

 Some multi-register operations take longer.

 All instructions can be executed conditionally.

MPC850

Integrated Communication Microprocessor

Reference Manuals

 MPC850 Family User Manual  PowerPC Programming Environment Manual  Course Home Page http://calab.kaist.ac.kr/~maeng/cs310/micro02.h

tm  Motorola Home Page http://e-www.motorola.com

Overview

 Versatile, one-chip, integrated communication processor  Embedded PowerPC core  Versatile memory controller  Communication processor module (CPM)  Serial communication controllers (SCCs)  One USB  Etc.

Embedded PowerPC core

 Single issue, 32-bit version  Branch folding and prediction  2-K byte I-cache, 1K byte D-cache  2-way set-associative  Physical  MMUs with 8-entry TLBs  4K, 16K, 256K, 512K, and 8MB page sizes

Other Features

 Dynamic data bus sizing : 8-, 16-, 32-bit  CPU clock : 0-80MHz  System Integration Unit (SIU)  Memory Controller  General Purpose timer  CPM, SCCs, SMCs, etc.

PowerPC Architecture

PowerPC instruction set

   Overview Operand Conventions PowerPC Registers and programming model   Addressing Modes Instruction Set   Cache model Exception Model  Memory management model

PowerPC Architecture

 Motorola, IBM, Apple computer  Power Architecture: RS/6000 family  64-bit architecture with a 32-bit subset  Three Levels of the architecture  Flexibility – degrees of SW compatibility  UISA (User instruction set architecture)  VEA (Virtual environment architecture)  OEA (Operating environment architecture)

Features not defined by the PowerPC Architecture

 For flexibility  System bus interface signals  Cache design  The number and the nature of execution units  Other internal micro-architecture issues

Endianness

 Relationship between bit and byte/word ordering defines endianness: bit 31 bit 0 byte 3 byte 2 byte 1 byte 0 little-endian bit 0 bit 31 byte 0 byte 1 byte 2 byte 3 big-endian

ARM, Intel PowerPC, IBM, Motorola

Programming Model – Registers

PowerPC programming model - Register Set

 User Model – UISA (32-bit architecture) GPR0(32) GPR1(32) FGPR0(64) FGPR1(64)

Condition register

CR(32)

FP status and control register

GPR31(32) FGPR31(64) FPSCR(32)

XER register

XER(32)

Link register

LR(64/32)

Count register

CTR(64/32)

Condition Registers (CR)

 For testing and branching 0 CR0 CR1 CR2 FP CR3 CR4 CR5 CR6 CR7 31

For all integer instrs.

Bit0: Negative(LT) Bit1: Positive(GT) Bit2: Zero (EQ) Bit3: Summary Overflow(SO)

Condition register CRn Field – Compare Instruction

back

XER Register (XER)

back

XER Register (XER), cont’d

Link Register (LR), Count Register (CTR)

bclrx (bc to link register) Branch with link update

Counter Register

 Loop count

VEA Register Set – Time Base

OEA Register Set

Machine State Register (MSR)

Addressing Modes

 Effective Address Calculation  Register indirect with immediate index mode  Register indirect with index mode  Register indirect mode

Register Indirect with Immediate Index Addressing

back

Register Indirect with Index

back

Register Indirect

back

Instruction Formats

 4 bytes long and word-aligned  Bits 0-5 always specify the primary opcode  Extended opcode

Instruction set

   Integer Floating-point Load and store   Flow control Processor control   Memory synchronization Memory control  External control

Integer Instructions

 Arithmetic, compare, logical, rotate and shift  Integer arithmetic, shift, rotate, and string move  May update or read values from the XER  The CR may be updated if the R

c

bit is set.

 addic - addic.

Integer Compare

 Algebraically, logically 

crf

D can be omitted if the result is to be placed in CR0 

crf

D field : the target CR  The L bit has no effect on 32-bit operations

Integer compare, cont’d

Integer Logical

Integer Logical, cont’d

Rotate and Shift Instructions

 SH: specify the number of bits to rotate  MB: mask start  ME: mask stop

Integer Rotate

Integer Shift

Load and Store

 Integer load and store  Integer load and store with byte-reverse  Integer load and store multiple  FP load and store  Memory synchronization

Branch and Flow Control

 EA calculation  Branch relative  Branch conditional to relative address  Branch to absolute address  Branch conditional to absolute address  Branch conditional to link register  Branch conditional to count register

Branch Relative

Branch conditional to relative

Branch to Absolute

Branch conditional to absolute

Branch conditional to LR

Branch conditional to count register

Conditional Branch control

Branch Instructions

CR logical Instructions

Trap, System Linkage

Processor Control

Memory Synchronization

Example

 Test and Set loop:

lwarx

r5,0,r3 # load and reserve cmpwi r5,0 # done if word bne $+12 # not equal to 0

stwcx.

r4,0,r3 # try to store non-zero bne loop # loop if lost reservation

Summary

     UISA, VEA, OEA  Register set Fixed size instruction - RISC Load and store architecture  3 addressing modes Condition Register Update – Rc field  8 condition registers Branch addressing modes   BO, BI fields Relative, absolute, LR, CTR