SHARC programming model

Download Report

Transcript SHARC programming model

Embedded System HW
Processor Technology
KAIST 전산학과
맹 승 렬
[email protected]
Processor Technology
 General Purpose (“software”)
 Application Specific
 Single Purpose (“Hardware”)
 IC technology
• Full Custom/VLSI
• Semi-custom ASIC (gate-array, standard cell)
• PLD
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
2
Custom single-purpose
processors: “Hardware”
KAIST 전산학과
맹 승 렬
[email protected]
Outline





Introduction
Combinational logic
Sequential logic
Custom single-purpose processor design
RT-level custom single-purpose processor design
* Read chapter 2 in “Embedded System Design: A
unified Hardware/Software Introduction,” Frank
Vahid and Tony Givargis.
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
4
Introduction
 Processor
• Digital circuit that performs a
computation tasks
• Controller and datapath
• General-purpose: variety of
computation tasks
CCD
• Single-purpose: one particular
computation task
• Custom single-purpose: non-standardlens
task
 A custom single-purpose
processor may be
• Fast, small, low power
• But, high NRE, longer time-to-market,
less flexible
Digital camera chip
A2D
CCD
preprocessor
JPEG codec
Microcontroller
D2A
Multiplier/Accum
DMA controller
Memory controller
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
Pixel coprocessor
Display
ctrl
ISA bus interface
UART
LCD ctrl
5
Custom single-purpose processor basic
model
…
…
external
control
inputs
…
external
data
inputs
…
controller
datapath
control
inputs
…
datapath
control
outputs
external
control
outputs
datapath
…
controller
datapath
next-state
and
control
logic
registers
state
register
functional
units
external
data
outputs
…
controller and datapath
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
…
a view inside the controller and datapath
6
Example: greatest common divisor
 First create algorithm
 Convert algorithm to
“complex” state machine
• Known as FSMD: finitestate machine with
datapath
• Can use templates to
perform such conversion
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
!1
(a) black-box
view
1:
1
!(!go_i)
(c) state
diagram
2:
go_i
x_i
y_i
!go_i
2-J:
GCD
3:
x = x_i
4:
y = y_i
d_o
(b) desired functionality
0: int x, y;
1: while (1) {
2: while (!go_i);
3: x = x_i;
4: y = y_i;
5: while (x != y) {
6:
if (x < y)
7:
y = y - x;
else
8:
x = x - y;
}
9: d_o = x;
}
!(x!=y)
5:
x!=y
6:
x<y
7:
y = y -x
!(x<y)
8: x = x - y
6-J:
5-J:
9:
d_o = x
1-J:
7
State diagram templates
Assignment statement
Loop statement
a=b
next statement
a=b
Branch statement
while (cond) {
loop-bodystatements
}
next statement
!cond
C:
if (c1)
c1 stmts
else if c2
c2 stmts
else
other stmts
next statement
C:
c1
cond
loop-bodystatements
next
statement
c2 stmts
!c1*!c2
others
J:
J:
next
statement
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
c1 stmts
!c1*c2
next
statement
8
Creating the datapath
 Create a register for any
!1
1:
declared variable
 Create a functional unit
for each arithmetic
operation
 Connect the ports,
registers and functional
units
• Based on reads and
writes
• Use multiplexors for
multiple sources
 Create unique identifier
• for each datapath
component control input
and output
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
1
!(!go_i)
2:
x_i
!go_i
y_i
Datapath
2-J:
x_sel
3:
x = x_i
4:
y = y_i
x_ld
0: x
0: y
!(x!=y)
x!=y
6:
x<y
y = y -x
n-bit 2x1
y_ld
5:
7:
n-bit 2x1
y_sel
!(x<y)
8: x = x - y
!=
5: x!=y
x_neq_y
x_lt_y
<
6: x<y
subtractor
8: x-y
subtractor
7: y-x
9: d
d_ld
d_o
6-J:
5-J:
9:
d_o = x
1-J:
9
Creating the controller’s FSM
go_i
!1
1:
Controller
1
!(!go_i)
0000
1:
0001
2:
!1
1
2:
!go_i
!(!go_i)
actions/conditions with
datapath configurations
!go_i
2-J:
0010 2-J:
3:
x = x_i
4:
y = y_i
0011
x_sel = 0
3: x_ld = 1
0100
y_sel = 0
4: y_ld = 1
!(x!=y)
5:
0101
x_i
0110
x<y
7:
y = y -x
!(x<y)
8: x = x - y
5:
Datapath
x_sel
x_neq_y
6:
!x_lt_y
x_sel
=1
8:
x_ld = 1
5-J:
9:
1-J:
d_o = x
!=
5: x!=y
x_neq_y
1010 5-J:
x_lt_y
1011
9:
d_ld = 1
1100 1-J:
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
n-bit 2x1
0: x
0: y
y_ld
1000
1001 6-J:
n-bit 2x1
y_sel
x_ld
x_lt_y
7: y_sel = 1
y_ld = 1
0111
6-J:
y_i
!x_neq_y
x!=y
6:
 Same structure as FSMD
 Replace complex
<
6: x<y
subtractor
8: x-y
subtractor
7: y-x
9: d
d_ld
d_o
10
Splitting into a controller and datapath
go_i
Controller implementation model
Controller
0000
go_i
!1
x_i
1:
1
x_sel
Combinational
logic
y_sel
0001
x_neq_y
!(!go_i)
x_sel
!go_i
0010 2-J:
0011
x_lt_y
d_ld
0100
x_ld
x_sel = 0
3: x_ld = 1
5:
0110
6:
!=
x_neq_y=0
x_neq_y=1
State register
I2
I1
I0
n-bit 2x1
0: x
0: y
y_ld
y_sel = 0
4: y_ld = 1
0101
n-bit 2x1
y_sel
Q3 Q2 Q1 Q0
I3
(b) Datapath
2:
x_ld
y_ld
y_i
x_lt_y=1
7: y_sel = 1
y_ld = 1
x_lt_y=0
x_sel
=1
8:
x_ld = 1
0111
5: x!=y
x_neq_y
x_lt_y
<
6: x<y
subtractor
8: x-y
subtractor
7: y-x
9: d
d_ld
d_o
1000
1001 6-J:
1010 5-J:
1011
9:
d_ld = 1
1100 1-J:
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
11
Controller state table for the GCD
example
Inputs
Q3
Q2
Q1
Q0
0
0
0
0
0
0
Outputs
x_lt_
y
*
go_i
I3
I2
I1
I0
x_sel
y_sel
x_ld
y_ld
d_ld
0
x_neq
_y
*
*
0
0
0
1
X
X
0
0
0
0
1
*
*
0
0
0
1
0
X
X
0
0
0
0
0
1
*
*
1
0
0
1
1
X
X
0
0
0
0
0
1
0
*
*
*
0
0
0
1
X
X
0
0
0
0
0
1
1
*
*
*
0
1
0
0
0
X
1
0
0
0
1
0
0
*
*
*
0
1
0
1
X
0
0
1
0
0
1
0
1
0
*
*
1
0
1
1
X
X
0
0
0
0
1
0
1
1
*
*
0
1
1
0
X
X
0
0
0
0
1
1
0
*
0
*
1
0
0
0
X
X
0
0
0
0
1
1
0
*
1
*
0
1
1
1
X
X
0
0
0
0
1
1
1
*
*
*
1
0
0
1
X
1
0
1
0
1
0
0
0
*
*
*
1
0
0
1
1
X
1
0
0
1
0
0
1
*
*
*
1
0
1
0
X
X
0
0
0
1
0
1
0
*
*
*
0
1
0
1
X
X
0
0
0
1
0
1
1
*
*
*
1
1
0
0
X
X
0
0
1
1
1
0
0
*
*
*
0
0
0
0
X
X
0
0
0
1
1
0
1
*
*
*
0
0
0
0
X
X
0
0
0
1
1
1
0
*
*
*
0
0
0
0
X
X
0
0
0
1
1
1
1
*
*
*
0
0
0
0
X
X
0
0
0
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
12
Completing the GCD custom singlepurpose processor design
 We finished the
datapath
 We have a state table
for the next state and
control logic
• All that’s left is
combinational logic
design
 This is not an optimized
design, but we see the
basic steps
…
…
controller
datapath
next-state
and
control
logic
registers
state
register
functional
units
…
…
a view inside the controller and datapath
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
13
Summary
 Custom single-purpose processors
•
•
•
•
Straightforward design techniques
Can be built to execute algorithms
Typically start with FSMD
CAD tools can be of great assistance
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
14
General-Purpose Processors:
“Software”
KAIST 전산학과
맹 승 렬
[email protected]
Introduction
 General-Purpose Processor
• Processor designed for a variety of computation tasks
• Low unit cost, in part because manufacturer spreads NRE
over large numbers of units
– Motorola sold half a billion 68HC05 microcontrollers in
1996 alone
• Carefully designed since higher NRE is acceptable
– Can yield good performance, size and power
• Low NRE cost, short time-to-market/prototype, high
flexibility
– User just writes software; no processor design
• a.k.a. “microprocessor” – “micro” used when they were
implemented on one or a few chips rather than entire
rooms
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
16
Why use microprocessors?
 Alternatives: field-programmable gate arrays
(FPGAs), custom logic, etc. (Custom Single-purpose
Processor or HW Logic)
 Microprocessors are often very efficient: can use
same logic to perform many different functions.
 Microprocessors simplify the design of families of
products.
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
17
The performance paradox
 Microprocessors use much more logic to implement
a function than does custom logic.
 But microprocessors are often at least as fast:
• heavily pipelined;
• large design teams;
• aggressive VLSI technology.
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
18
Power
 Custom logic is a clear winner for low power
devices.
 Modern microprocessors offer features to help
control power consumption.
 Software design techniques can help reduce power
consumption.
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
19
Basic Architecture
KAIST 전산학과
맹 승 렬
[email protected]
Basic Architecture
 Control unit and
Processor
datapath
Control unit
• Note similarity to
single-purpose
processor
Datapath
ALU
Controller
Control
/Status
 Key differences
• Datapath is general
• Control unit doesn’t
store the algorithm –
the algorithm is
“programmed” into
the memory
Registers
PC
IR
I/O
Memory
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
21
Pipelining: Increasing Instruction
Throughput
Wash
1
2
3
4
5
6
7
8
1
2
3
Non-pipelined
Dry
1
Decode
1
2
3
4
5
6
7
1
Time
4
5
6
7
8
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
Instruction 1
pipelined instruction execution
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
6
7
8
2
3
4
5
6
7
pipelined dish cleaning
3
Execute
Store res.
8
2
Fetch ops.
5
Pipelined
non-pipelined dish cleaning
Fetch-instr.
4
8
Time
Pipelined
8
Time
22
Superscalar and VLIW Architectures
 Performance can be improved by:
• Faster clock (but there’s a limit)
• Pipelining: slice up instruction into stages, overlap stages
• Multiple ALUs to support more than one instruction stream
– Superscalar
» Scalar: non-vector operations
» Fetches instructions in batches, executes as many
as possible
» May require extensive hardware to detect
independent instructions
– VLIW: each word in memory has multiple independent
instructions
» Currently growing in popularity
» Relies on the compiler to detect and schedule
instructions
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
23
Two Memory Architectures
Processor
 Princeton
Processor
• Fewer memory
wires
 Harvard
• Simultaneous
program and data
memory access
Program
memory
Data memory
Harvard
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
Memory
(program and data)
Princeton
24
Princeton vs. Harvard
 Harvard can’t use self-modifying code.
 Harvard allows two simultaneous memory fetches.
 Most DSPs use Harvard architecture for streaming
data:
• greater memory bandwidth;
• more predictable bandwidth.
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
25
Cache Memory
 Memory access may be
slow
 Cache is small but fast
memory close to processor
Fast/expensive technology, usually on
the same chip
Processor
• Holds copy of part of memory
• Hits and misses
Cache
Memory
Slower/cheaper technology, usually on
a different chip
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
26
Application-Specific
Instruction-Set Processors
(ASIPs)
KAIST 전산학과
맹 승 렬
[email protected]
Application-Specific Instruction-Set
Processors (ASIPs)
 General-purpose processors
• Sometimes too general to be effective in demanding
application
– e.g., video processing – requires huge video buffers and
operations on large arrays of data, inefficient on a GPP
• But single-purpose processor has high NRE, not
programmable
 ASIPs – targeted to a particular domain
• Contain architectural features specific to that domain
– e.g., embedded control, digital signal processing, video
processing, network processing, telecommunications,
etc.
• Still programmable
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
28
Microprocessor varieties
 Microcontroller: includes I/O devices, on-board
memory.
 Digital signal processor (DSP): microprocessor
optimized for digital signal processing.
 Typical embedded word sizes: 8-bit, 16-bit, 32-bit.
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
29
Embedded Processors
 임베디드 프로세서
• 원래는 마이크로컨트롤러를 의미
• 마이크로컨트롤러를 확장한 개념으로도 사용
• CPU 코어, 메모리, 주변 장치, 입출력장치에 다양한 종류의
네트워크 장치가 추가되는 형태
Netsilicon NET+ARM Embedded
Processor
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
30
Many Types of Programmable
Processors

Past
 Microprocessor
 Microcontroller
 DSP
 Graphics
Processor
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing

Now / Future
 Network
Processor
 Sensor Processor
 Cryptoprocessor
 Game Processor
 Wearable Processor
 Mobile Processor
31
A Common ASIP: Microcontroller
 For embedded control applications
• Reading sensors, setting actuators
• Mostly dealing with events (bits): data is present, but not in huge
amounts
• e.g., VCR, disk drive, digital camera (assuming SPP for image
compression), washing machine, microwave oven
 Microcontroller features
• On-chip peripherals
– Timers, analog-digital converters, serial communication, etc.
– Tightly integrated for programmer, typically part of register
space
• On-chip program and data memory
• Direct programmer access to many of the chip’s pins
• Specialized instructions for bit-manipulation and other low-level
operations
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
32
Another Common ASIP: Digital Signal
Processors (DSP)
 For signal processing applications
• Large amounts of digitized data, often streaming
• Data transformations must be applied fast
• e.g., cell-phone voice filter, digital TV, music synthesizer
 DSP features
• Several instruction execution units
• Multiple-accumulate single-cycle instruction, other instrs.
• Efficient vector operations – e.g., add two arrays
– Vector ALUs, loop buffers, etc.
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
33
Trend: Even More Customized ASIPs
 In the past, microprocessors were acquired as chips
 Today, we increasingly acquire a processor as Intellectual
Property (IP)
• e.g., synthesizable VHDL model
 Opportunity to add a custom datapath hardware and a few
custom instructions, or delete a few instructions
• Can have significant performance, power and size impacts
• Problem: need compiler/debugger for customized ASIP
– Remember, most development uses structured languages
– One solution: automatic compiler/debugger generation
» e.g., www.tensillica.com
– Another solution: retargettable compilers
» e.g., www.improvsys.com (customized VLIW
architectures)
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
34
Reconfigurable SoC
Other Examples
Atmel’s FPSLIC
(AVR + FPGA)
Altera’s Nios
(configurable
RISC on a PLD)
Triscend’s A7 CSoC
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
35
Selecting a Microprocessor
 Issues
• Technical: speed, power, size, cost
• Other: development environment, prior expertise, licensing, etc.
 Speed: how evaluate a processor’s speed?
• Clock speed – but instructions per cycle may differ
• Instructions per second – but work per instr. may differ
• Dhrystone: Synthetic benchmark, developed in 1984.
Dhrystones/sec.
– MIPS: 1 MIPS = 1757 Dhrystones per second (based on
Digital’s VAX 11/780). A.k.a. Dhrystone MIPS. Commonly
used today.
» So, 750 MIPS = 750*1757 = 1,317,750 Dhrystones per
second
• SPEC: set of more realistic benchmarks, but oriented to desktops
• EEMBC – EDN Embedded Benchmark Consortium,
www.eembc.org
– Suites of benchmarks: automotive, consumer electronics,
networking, office automation, telecommunications
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
36
Processors 비교
Processor
Clock speed
Intel PIII
1GHz
IBM
PowerPC
750X
MIPS
R5000
StrongARM
SA-110
550 MHz
Intel
8051
Motorola
68HC811
250 MHz
233 MHz
12 MHz
3 MHz
TI C5416
160 MHz
Lucent
DSP32C
80 MHz
Periph.
2x16 K
L1, 256K
L2, MMX
2x32 K
L1, 256K
L2
2x32 K
2 way set assoc.
None
4K ROM, 128 RAM,
32 I/O, Timer, UART
4K ROM, 192 RAM,
32 I/O, Timer, WDT,
SPI
128K, SRAM, 3 T1
Ports, DMA, 13
ADC, 9 DAC
16K Inst., 2K Data,
Serial Ports, DMA
Bus Width
MIPS
General Purpose Processors
32
~900
Power
Trans.
Price
97W
~7M
$900
32/64
~1300
5W
~7M
$900
32/64
NA
NA
3.6M
NA
32
268
1W
2.1M
NA
8
Microcontroller
~1
~0.2W
~10K
$7
8
~.5
~0.1W
~10K
$5
Digital Signal Processors
16/32
~600
NA
NA
$34
32
NA
NA
$75
40
Sources: Intel, Motorola, MIPS, ARM, TI, and IBM Website/Datasheet; Embedded Systems Programming, Nov. 1998
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
37
Summary
 General-purpose processors
• Good performance, low NRE, flexible
 Controller, datapath, and memory
 Structured languages prevail
• But some assembly level programming still necessary
 Many tools available
• Including instruction-set simulators, and in-circuit emulators
 ASIPs
• Microcontrollers, DSPs, network processors, more customized
ASIPs
 Choosing among processors is an important step
 Designing a general-purpose processor is conceptually the
same as designing a single-purpose processor
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
38
Instruction Sets
KAIST 전산학과
맹 승 렬
[email protected]
RISC vs. CISC
 Complex instruction set computer (CISC):
• many addressing modes;
• many operations.
 Reduced instruction set computer (RISC):
• load/store;
• pipelinable instructions.
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
40
CISC 프로세서
 Intel 계열 마이크로프로세서의 종류 및 역사
연도
프로세서
이름
트렌지스터
개수
특징
1971
4004
2,250
인텔의 첫 마이크로 프로세스, Busicom 계산기에 사용
1972
8008
2,500
Mark-8에서 사용, 최초의 가정용 컴퓨터
1974
8080
5,000
Altair에서 사용
1978
8086/8088
1982
29,000
IBM-PC XT에서 사용, 인텔이 대기업으로 성장
80286
120,000
IBM-PC AT에서 사용, 6년간 천 5백만대 판매
1985
80386
275,000
32비트 멀티 테스킹 지원
1989
80486
1,180,000
수치 보조 프로세서 내장
1993
Pentium
3,100,000
음성, 이미지 처리 기능 강화
1995
Pentium
Pro
5,500,000
Dynamic Execution 구조 채택
1997
Pentium 2
7,500,000
MMX 기술 지원
1999
Pentium 3
24,000,000
SIMD 지원, 12 스테이지 파이프라인
2001
Itanium
25,000,000
64비트, Explicitly Parallel Instruction
Computing(EPIC)
2002
Pentium 4
55,000,000
20 스테이지 하이퍼 파이프라인, 하이퍼 쓰레딩
2003
Itanium 2
410,000,000
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
Machine Check Architecture, EPIC, 6MB L3 캐시
41
CISC - History : Packaging기술 변천
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
42
CISC - History
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
43
Instruction set characteristics




Fixed vs. variable length.
Addressing modes.
Number of operands.
Types of operands.
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
44
Programming model
 Programming model: registers visible to the
programmer.
 Some registers are not visible (IR).
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
45
Multiple implementations
 Successful architectures have several
implementations:
•
•
•
•
varying clock speeds;
different bus widths;
different cache sizes;
etc.
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
46
ARM Architecture
Advanced RISC Machines(1990)
(ACORN and Apple Computer)
KAIST 전산학과
맹 승 렬
[email protected]
ARM Architecture
 ARM versions.
 ARM assembly language.
 ARM programming model.
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
48
ARM versions
 ARM architecture has been extended over several
versions.
 We will concentrate on ARMv5
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
49
Evolution of the ARM architecture
versions
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
50
ARMv6 Improvement
 Memory management
 Multiprocessing
 Multimedia support: SIMD capability
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
51
Evolution of the ARM architecture
ARM11
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
52
Introduction
 To allow very small, yet high-performance implementations
 RISC
•
•
•
•
•
•
Large uniform register file
Load/store architecture
Simple addressing modes
Uniform and fixed-length instr fields
Auto-increment and auto-decrement addr modes
Conditional execution of all instrcutions
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
53
ARM assembly language
 Fairly standard assembly language:
label
LDR r0,[r8] ; a comment
ADD r4,r0,r1
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
54
Programming Model
KAIST 전산학과
맹 승 렬
[email protected]
ARM data types
 Byte :
 Halfword : 16 bits
• Must be aligned to two-byte boundaries
 Word : 32 bits
• Must be aligned to four-byte boundaries
 ARM addresses can be 32 bits long.
 Address refers to byte.
• Address 4 starts at byte 4.
 Can be configured at power-up as either little- or
bit-endian mode.
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
56
Processor modes
 User: usr – Normal program execution modes
 FIQ: fiq – Supports a high-speed data transfer or





channel process
IRQ: irq – Used for general-purpose interrupt
handling
Supervisor: svc – A protected mode for OS
Abort: abt – Implements VM and/or memory
protection
Undefined: und – Supports software emulation of
HW coprocessors
System: sys – Runs privileged OS tasks
 fiq, irq, svc, abt, und – exception modes
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
57
Registers
r0
r1
r2
r3
r4
r5
r6
r7
r8
r9
r10
r11
r12
r13
r14
r15 (PC)
0
31
CPSR
NZCV
Link register
unbanked registers
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
banked registers
58
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
59
Endianness
 Relationship between bit and byte/word ordering
defines endianness:
bit 31
bit 0
byte 3 byte 2 byte 1 byte 0
little-endian
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
bit 0
bit 31
byte 0 byte 1 byte 2 byte 3
big-endian
60
ARM status bits
 Every arithmetic, logical, or shifting operation may
set CPSR (current program statues register) bits:
• N (negative), Z (zero), C (carry), V (overflow).
 Examples:
• -1 + 1 = 0: NZCV = 0110.
• 231-1+1 = -231: NZCV = 0101.
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
61
ARM data processing – operand
addressing
 Instruction syntax
• <opcode>{<cond>}{S} <Rd>, <Rn>, <shifter-operand>
 <shifter-operand> has 11 options
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
62
Condition field
 Almost all ARM instrs. – conditionally executed
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
63
ARM data processing – operand
addressing
Data processing immediate shift
31
28
cond
25
000
21
19
opcode S
16
Rn
12
Rd
7
5
4
shift amount shift
0
3
0
Rm
Data processing register shift
31
28
cond
25
000
21
19
opcode S
16
Rn
12
Rd
7
Rs
5
4
0 shift
1
3
0
Rm
Data processing 32-bit immediate
31
28
cond
25
001
21
opcode S
19
16
Rn
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
12
Rd
7
rotate
5
4
3
0
immediate-8
64
Shifter operand
 Immediate
• 8-bit constant and a 4-bit rotate (0,2,4,8,…,30)
– mov r0, #0
– add r9, r9,#1
 Register operand
– mov r2, r0
 Shifted register operand
• ASR, LSL, LSR, ROR, RRX (by one bit)
– mov r2, r0, LSL #2 ; shift r0 left by 2, write to r2
(r2=r0x4)
– sub r10,r9,r8, LSR #4 ; r10 = r9 - r8/16
– sov r10,r9,r8, ROR r3 ; r10 = r9 - (r8 rotated by value
of r3)
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
65
ARM data-processing
 AND
 EOR
 SUB : Rd:= Rn - shifter





operand
RSB : Rd:= shifter operand Rn
ADD
ADC (with carry)
SBC
RSC (reverse SBC)
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
 TST : update flags after Rn







AND shifter operand
TEQ
CMP
CMN: copmare negated
ORR (logical OR)
MOV
BIC
MVN (mov not)
66
ARM data-processing
 Shift, Rotate ? – shifter-operand
•
•
•
•
LSL, LSR : logical shift left/right
ASR : arithmetic shift left/right
ROR : rotate right
RRX : rotate right extended with C
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
67
Data operation varieties
 Logical shift:
• fills with zeroes.
 Arithmetic shift:
• fills with sign extension
 RRX performs 33-bit rotate, including C bit from
CPSR above sign bit.
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
68
Load and Store instructions
 Two types
• 32-bit word or an 8-bit unsigned byte
• Load and store halfword and load signed byte
 Addressing modes
• Base register
– Any one of GPR (including the PC)
• Offset
– Three format
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
69
Addressing modes
 Offset
• Immediate: unsigned number (12 bits or 8 bits)
• Register: GPR (not the PC)
• Scaled register: shifted by an immediate value
– LSL, LSR, ASR, ROR, RRX
 Three ways to form the memory address
– EA := Base register + or – Offset
• Offset
• Pre-indexed
• Post-indexed
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
70
Addressing modes
 Base-plus-offset addressing:
LDR r0,[r1,#16]
• Loads from location r1+16
 Pre-indexing increments base register:
LDR r0,[r1,#16]!
 Post-indexing fetches, then does offset:
LDR r0,[r1],#16
• Loads r0 from r1, then adds 16 to r1.
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
71
Load and store





LDR
LDRB
LDRH
LDRSB (signed byte)
LDRSH (signed halfw)
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
 STR
 STRB
 STRH
72
Examples
LDR R1, [R0]
;
LDR
R8, [R3, #4] ;
LDR R8, [R3, #-4]
;
STRB
R10, [R7, -R4] ;
LDR R11, [R3, R5, LSL #2] ;
load R1 from the address in R0
EA = [R3] + 4
EA = [R3] – 4
EA = [R7] – [R4]
EA = [R3] + ([R5]x4)
LDR
R3, [R9], #4 ; EA = [R9], R9 = [R9] +4 post-indexed
LDR R1, [R0, #2] !
; EA = [R0]+2, R0=[R0]+2 pre-indexed
LDR R0, [PC, #40]
; load R0 from PC+0x40 (= address of the
; instruction +8 + 0x40)
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
73
Load and store multiple
 Addressing modes
•
•
•
•
IA : increment after
IB : increment before
DA: decrement after
DB: decrement before
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
74
Load and store multiple
 LDM
 STM
 Examples
• LDMIA r0, {r5 – r8}
• STMDA r1!, {r2, r5, r7 – r9, r11}
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
; load multiple r5-r8 from
; the address in r0
; update r1
75
Branch instructions
 Conditional branch forwards or backwards up to 32
MB
•
•
•
•
Sign-extending the 24-bit imm_data to 32 bits
Shifting the result left two bits
Adding this to the PC (the addr of branch +8)
Approximately ± 32MB
 B, BL
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
76
Examples
B
label
BCC
label ; branch if carry flag is clear
BEQ
label ; if zero flag is set
MOV PC, #0 ; branch to location zero
BL
func ; subroutine call
MOV PC,LR ; return
MOV LR, PC
LDR PC, =func ;
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
77
ARM ADR pseudo-op
 Cannot refer to an address directly in an instruction.
 Generate value by performing arithmetic on PC.
 ADR pseudo-op generates instruction required to
calculate address:
ADR r1,FOO
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
78
Examples
start
MOV
ADR
r0, #10
r4, start; => SUB r4,pc,#0xc
start = pc - 4 - 8 = pc - 12 = pc - 0xc
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
79
Example: C assignments
 C:
x = (a + b) - c;
 Assembler:
ADR
LDR
ADR
LDR
ADD
ADR
LDR
r4,a
r0,[r4] ;
r4,b
r1,[r4] ;
r3,r0,r1 ;
r4,c
r2[r4] ;
; get address for a
get value of a
; get address for b, reusing r4
get value of b
compute a+b
; get address for c
get value of c
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
80
C assignment, cont’d.
SUB r3,r3,r2 ; complete computation of x
ADR r4,x
; get address for x
STR r3[r4] ; store value of x
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
81
Example: C assignment
 C:
y = a*(b+c);
 Assembler:
ADR
LDR
ADR
LDR
ADD
ADR
LDR
r4,b ; get address for b
r0,[r4] ; get value of b
r4,c ; get address for c
r1,[r4] ; get value of c
r2,r0,r1 ; compute partial result
r4,a ; get address for a
r0,[r4] ; get value of a
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
82
C assignment, cont’d.
MUL r2,r2,r0 ; compute final value for y
ADR r4,y ; get address for y
STR r2,[r4] ; store y
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
83
Example: C assignment
 C:
z = (a << 2) |
(b & 15);
 Assembler:
ADR
LDR
MOV
ADR
LDR
AND
ORR
r4,a ; get address for a
r0,[r4] ; get value of a
r0,r0,LSL 2 ; perform shift
r4,b ; get address for b
r1,[r4] ; get value of b
r1,r1,#15 ; perform AND
r1,r0,r1 ; perform OR
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
84
C assignment, cont’d.
ADR r4,z ; get address for z
STR r1,[r4] ; store value for z
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
85
Example: if statement
 C:
if (a < b) { x = 5; y = c + d; } else x = c - d;
 Assembler:
; compute and test condition
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
ADR r4,b ; get address for b
LDR r1,[r4] ; get value for b
CMP r0,r1 ; compare a < b
BGE fblock ; if a >= b, branch to false block
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
86
If statement, cont’d.
; true block
MOV r0,#5 ; generate value for x
ADR r4,x ; get address for x
STR r0,[r4] ; store x
ADR r4,c ; get address for c
LDR r0,[r4] ; get value of c
ADR r4,d ; get address for d
LDR r1,[r4] ; get value of d
ADD r0,r0,r1 ; compute y
ADR r4,y ; get address for y
STR r0,[r4] ; store y
B after ; branch around false block
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
87
If statement, cont’d.
; false block
fblock ADR r4,c ; get address for c
LDR r0,[r4] ; get value of c
ADR r4,d ; get address for d
LDR r1,[r4] ; get value for d
SUB r0,r0,r1 ; compute a-b
ADR r4,x ; get address for x
STR r0,[r4] ; store value of x
after ...
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
88
Example: Conditional instruction
implementation
; true block
MOVLT r0,#5 ; generate value
ADRLT r4,x ; get address for
STRLT r0,[r4] ; store x
ADRLT r4,c ; get address for
LDRLT r0,[r4] ; get value of
ADRLT r4,d ; get address for
LDRLT r1,[r4] ; get value of
ADDLT r0,r0,r1 ; compute y
ADRLT r4,y ; get address for
STRLT r0,[r4] ; store y
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
for x
x
c
c
d
d
y
89
Conditional instruction implementation,
cont’d.
; false block
ADRGE r4,c ; get address for c
LDRGE r0,[r4] ; get value of c
ADRGE r4,d ; get address for d
LDRGE r1,[r4] ; get value for d
SUBGE r0,r0,r1 ; compute a-b
ADRGE r4,x ; get address for x
STRGE r0,[r4] ; store value of x
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
90
Example: FIR filter
 C:
for (i=0, f=0; i<N; i++)
f = f + c[i]*x[i];
 Assembler
; loop
MOV
MOV
ADR
LDR
MOV
initiation code
r0,#0 ; use r0 for I
r8,#0 ; use separate index for arrays
r2,N ; get address for N
r1,[r2] ; get value of N
r2,#0 ; use r2 for f
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
91
FIR filter, cont’.d
ADR r3,c ; load r3 with base of c
ADR r5,x ; load r5 with base of x
; loop body
loop LDR r4,[r3,r8] ; get c[i]
LDR r6,[r5,r8] ; get x[i]
MUL r4,r4,r6 ; compute c[i]*x[i]
ADD r2,r2,r4 ; add into running sum
ADD r8,r8,#4 ; add one word offset to array index
ADD r0,r0,#1 ; add 1 to i
CMP r0,r1 ; exit?
BLT loop ; if i < N, continue
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
92
Nested subroutine calls
 Nesting/recursion requires coding convention:
f1
LDR r0,[r13] ; load arg into r0 from stack
; call f2()
STR r14,[r13]! ; store f1’s return adrs
STR r0,[r13]! ; store arg to f2 on stack
BL f2 ; branch and link to f2
; return from f1()
SUB r13,#4 ; pop f2’s arg off stack
LDR r15,[r13]! ; restore register and return
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
93
Summary
 Load/store architecture
 Most instructions are RISCy, operate in single cycle.
• Some multi-register operations take longer.
 All instructions can be executed conditionally.
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
94
MPC850
KAIST 전산학과
맹 승 렬
[email protected]
Reference Manuals
 MPC850 Family User Manual
 PowerPC Programming Environment Manual
• Course Home Page
http://calab.kaist.ac.kr/~maeng/cs310/micro02.htm
• Motorola Home Page
http://e-www.motorola.com
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
96
Overview
 Versatile, one-chip, integrated communication
processor
• Embedded PowerPC core
• Versatile memory controller
• Communication processor module (CPM)
– Serial communication controllers (SCCs)
– One USB
– Etc.
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
97
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
98
Embedded PowerPC core
 Single issue, 32-bit version
 Branch folding and prediction
 2-K byte I-cache, 1K byte D-cache
• 2-way set-associative
• Physical
 MMUs with 8-entry TLBs
 4K, 16K, 256K, 512K, and 8MB page sizes
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
99
Other Features






Dynamic data bus sizing : 8-, 16-, 32-bit
CPU clock : 0-80MHz
System Integration Unit (SIU)
Memory Controller
General Purpose timer
CPM, SCCs, SMCs, etc.
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
100
PowerPC Architecture
KAIST 전산학과
맹 승 렬
[email protected]
PowerPC instruction set








Overview
Operand Conventions
PowerPC Registers and programming model
Addressing Modes
Instruction Set
Cache model
Exception Model
Memory management model
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
102
PowerPC Architecture




Motorola, IBM, Apple computer
Power Architecture: RS/6000 family
64-bit architecture with a 32-bit subset
Three Levels of the architecture
• Flexibility – degrees of SW compatibility
– UISA (User instruction set architecture)
– VEA (Virtual environment architecture)
– OEA (Operating environment architecture)
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
103
Features not defined by the PowerPC
Architecture





For flexibility
System bus interface signals
Cache design
The number and the nature of execution units
Other internal micro-architecture issues
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
104
Endianness
 Relationship between bit and byte/word ordering
defines endianness:
bit 31
bit 0
byte 3 byte 2 byte 1 byte 0
little-endian
ARM, Intel
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
bit 0
bit 31
byte 0 byte 1 byte 2 byte 3
big-endian
PowerPC, IBM,
Motorola
105
Programming Model – Registers
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
106
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
107
PowerPC programming model
- Register Set
 User Model – UISA (32-bit architecture)
GPR0(32)
GPR1(32)
FGPR0(64)
FGPR1(64)
Condition register
CR(32)
FP status and control
register
GPR31(32)
FGPR31(64)
FPSCR(32)
XER register
Link register
Count register
XER(32)
LR(64/32)
CTR(64/32)
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
108
Condition Registers (CR)
 For testing and branching
CR0
CR1
CR2
CR3
0
CR4
CR5
CR6
CR7
31
FP
For all integer instrs.
Bit0: Negative(LT)
Bit1: Positive(GT)
Bit2: Zero (EQ)
Bit3: Summary Overflow(SO)
Condition register CRn
Field – Compare
Instruction
back
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
109
XER Register (XER)
back
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
110
XER Register (XER), cont’d
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
111
Link Register (LR), Count Register (CTR)
bclrx (bc to link register)
Branch with link update
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
112
Counter Register
 Loop count
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
113
VEA Register Set – Time Base
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
114
OEA Register Set
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
115
Machine State Register (MSR)
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
116
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
117
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
118
Addressing Modes
 Effective Address Calculation
• Register indirect with immediate index mode
• Register indirect with index mode
• Register indirect mode
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
119
Register Indirect with Immediate Index
Addressing
back
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
120
Register Indirect with Index
back
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
121
Register Indirect
back
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
122
Instruction Formats
 4 bytes long and word-aligned
 Bits 0-5 always specify the primary opcode
• Extended opcode
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
123
Instruction set








Integer
Floating-point
Load and store
Flow control
Processor control
Memory synchronization
Memory control
External control
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
124
Summary
 UISA, VEA, OEA
• Register set
 Fixed size instruction - RISC
 Load and store architecture
• 3 addressing modes
 Condition Register Update – Rc field
• 8 condition registers
 Branch addressing modes
• BO, BI fields
• Relative, absolute, LR, CTR
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
125
RISC – Xscale Microarchitecture Features
 Arm Architecture Version 5TE ISA 호환
 저전력 & 고성능(최대 400MHz)
 Modified Harvard Architecture
• instruction cache와 data cache가 분리(2 caches)
• 32KB Instruction Cache
• 32KB Data Cache






Intel Media Processing Technology
Instruction and Data Memory Management Unit
Branch Target Buffer
Debug Capability via JTAG Port
0.35μm 3 Layer metal CMOS, 2.6 million transistor
256 PBGA package (17 x 17mm)
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
126
RISC – Xscale System Integration
Features
 Memory controller
 Power management controller
• Normal, idle, sleep mode 지원
 USB client
 Multi channel DMA controller
• 소프트웨어 프로그램 가능, 외부 DMA 지원
 LCD controller
 AC97 codec
 Multimedia card: serial interface to standard
memory card, FIFO 포함
 FIR communication: 적외선 통신 포트
 Synchronous serial protocol port
 I 2C
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
127
RISC – Xscale System Integration
Features
 85 GPIO ports
• irq, “wake up” interrupt 생성
 UART
 Real-time clock and timer
• 32비트 카운터, 32.7kHz 크리스탈, 정밀도 +/- 5sec/mon
 OS timer with alarm register
 Pulse width modulation
 Interrupt controller
• 모든 시스템 인터럽트를 라우팅
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
128
RISC – XScale 블록도
 Architecture : V5TE로 발전
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
129
Internal Structure
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
130
RISC - Xscale 예제
 Palm size device - Example
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
131
PXA255 Pin
Serial Channel 0 (USB)
Serial Channel 1
Serial Channel 2 (IrDA)
Serial Channel 3 (UART)
Serial Channel 4(CODEC)
Power Management
Clocks, Reset and Test
JTAG
UDCUDC+
RXD_1
TXD_1
RXD_2
TXD_2
RXD_3
TXD_3
TXD_C
RXD_C
SFRM_C
SCLK_C
BATT_FAULT
VDD_FAULT
PWR_EN
TCK_BYP
TESTCLK
PEXTAL
PXTAL
TEXTAL
TXTAL
nRESET
nRESET_OUT
SMROM_EN
ROM_SEL
TCK
TDI
TDO
TMS
nTRST
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
Intelⓡ
XScale*
PXA250
[256-pins]
L_DD(15:0)
L_FCLK
L_LCLK
L_PCLK
L_BIAS
GP(27:0)
nCAS/ DQM(3:0)
nRAS/ nSDCS(3:0)
nOE
nWE
nCS(5:0)
RDY
nSDRAS
nSDCAS
SDCKE<1:0>
SDCLK<2:0>
RD/nWR
nPOE
nPWE
nPIOR
nPIOW
nPCE<2:1>
PSKTSEL
nPREG
nPWAIT
nIOIS16
A<25:0>
D<31:0>
VDD
VDDX
VSS/VSSX
LCDControl
GPIO Ports
Memory Control
Transceiver Control
PCMCIA Bus Signals
Address Bus
Data Bus
Supply
132
RISC – Xscale running modes
 PXA255 동작모드
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
133
PXA255 Processor
 XScale Core
• 32Bit RISC
• 32Bit registers
• 32Bit instructions
– Longword aligned
• 32Bit datapaths
• 7~8 stage pipelineM1
Multiplier
Stage1
M2
Multiplier
Stage2
Mx
Multiplier
Stage X
MAC pipeline
2004
전문대교수연수
([email protected])
2004년Fall
SEP561 Embedded
Computing
F1
Instruction
Fetch1
PC
F2
Instruction
Fetch2
PC - 4
ID
Instruction
Decode
PC - 8
RF
Register File
Operand
Shifter
X1
ALU
Execute
PC - 16
X2
State
Execute
D1
Data Cache
Access
XWB
Write
Back
D2
Data Cache
Access
DWB
Data Cache
Writeback
PC - 12
Main execution
pipeline
Memory pipeline
134