Lectures for 2nd Edition

Download Report

Transcript Lectures for 2nd Edition

5장 Data Path
데이터패스에 들어가기 전에…
2008 전산기개론
1
ALU (연산논리장치) : CPU 만들기 첫 이야기…
•
•
목표 : CPU 내부의 논리/연산 장치는 어떻게 되어있는지?
First, let’s review Boolean Logic and build the ALU we’ll need
(Material from Appendix B)
operation
a
32
ALU
result
32
b
32 bit 버스
2008 전산기개론
2
ALU (논리연산장치)를 만들기 위한 기본 단위 (gate)
AND 
OR +
a
c
b
a
c
b
INVERTER
a
c
a
b
C=ab
C=a+b
0
0
0
0
0
1
0
1
1
0
0
1
1
1
1
1
a
C=a
0
1
1
0
d
C
0
a
1
b
d
a
Multiplexor b
(Selector)
0
1
c
2008 전산기개론
3
Adder ( 1 bit full adder)
입력
CarryIn
a
Sum
b
CarryOut
출력
a
b
CarryIn
CarryOut
Sum
0
0
0
0
0
0
0
1
0
1
0
1
0
0
1
0
1
1
1
0
1
0
0
0
1
1
0
1
1
0
1
1
0
1
0
1
1
1
1
1
cout = a b + a cin + b cin + a b cin
= a b + a cin + b cin
sum = a b cin + a b cin + a b cin + a b cin
= a xor b xor cin
2008 전산기개론
4
Review: The Multiplexor
•
Selects one of the inputs to be the output, based on a control input
- Operation 신호선 S 에 따라 입력선 A 와 B 중에 어떤 것을 선택하여 출력선 C에 연결할 지를 결정한다.
S
•
A
0
B
1
C
note: we call this a 2-input mux
even though it has 3 inputs!
- S 가 0 이면 출력 C 에 입력 A가 연결
Lets build our ALU using a MUX:
2008 전산기개론
5
Different Implementations
•
Not easy to decide the “best” way to build something
•
– Don't want too many inputs to a single gate
– Don’t want to have to go through too many gates
– for our purposes, ease of comprehension is important
Let's look at a 1-bit ALU for addition:
CarryIn
a
Sum
b
cout = a b + a cin + b cin
sum = a xor b xor cin
CarryOut
•
How could we build a 1-bit ALU for add, and, and or?
•
How could we build a 32-bit ALU?
2008 전산기개론
6
32 bit ALU 만들기 : 1 bit ALU를 활용
CarryIn
Operation
CarryIn
a0
b0
a
CarryIn
ALU0
Result0
CarryOut
0
a1
1
b
Operation
+
Result
b1
ALU1
Result1
CarryOut
a2
2
CarryIn
b2
CarryIn
ALU2
Result2
CarryOut
CarryOut
1 비트 ALU
1 비트 Adder
a31
b31
CarryIn
ALU31
Result31
32 비트 ripple carry ALU
2008 전산기개론
7
What about subtraction (a – b) ?
•
•
Two's complement approach: just negate b and add.
How do we negate?
•
A very clever solution: Binvert = 1 그리고 CarryIn = 1 (LSB of 32bit Adder)
Binvert
Operation
CarryIn
a
0
1
b
0
: 32bit ALU의 LSB 부분
Result
2
1
CarryOut
2008 전산기개론
8
Tailoring the ALU to the MIPS
•
Need to support the set-on-less-than instruction (slt)
– remember: slt is an arithmetic instruction
– produces a 1 if rs < rt and 0 otherwise
– use subtraction: (a-b) < 0 implies a < b
•
Need to support test for equality (beq $t5, $t6, $t7)
– use subtraction: (a-b) = 0 implies a = b
2008 전산기개론
9
Supporting slt
•
Can we figure out the idea?
Operation
Ainvert
Binvert
a
CarryIn
0
0
1
1
Result
b
0
+
2
1
Less
3
Set
Overflow
detection
Overflow
Use this ALU for most significant bit
Supporting slt
Operation
Ainvert
Binvert
a
CarryIn
0
0
1
1
Result
b
0
+
2
1
Less
3
CarryOut
all other bits
2008 전산기개론
11
Supporting slt
Operation
Binvert
Ainvert
CarryIn
a0
b0
CarryIn
ALU0
Less
CarryOut
Result0
a1
b1
0
CarryIn
ALU1
Less
CarryOut
Result1
a2
b2
0
CarryIn
ALU2
Less
CarryOut
Result2
..
.
a31
b31
0
..
. CarryIn
CarryIn
ALU31
Less
..
.
Result31
Set
Overflow
2008 전산기개론
12
Test for equality
O peration
Bnegate
Ainvert
•
Notice control lines:
0000
0001
0010
0110
0111
1100
=
=
=
=
=
=
and
or
add
subtract
slt
NOR
•Note: zero is a 1 when the result is zero!
a0
C arryIn
b0
ALU0
Result0
Less
Ca rryO ut
a1
C arryIn
b1
A LU 1
0
Less
Result1
a2
C arryIn
b2
ALU2
0
Zero
..
.
C arryO ut
Result2
Less
Ca rryO ut
..
.
..
.
CarryIn
a31
C arryIn
b31
A LU 31
0
Less
..
.
..
.
Result31
Set
Overflow
2008 전산기개론
13
Conclusion
•
We can build an ALU to support the MIPS instruction set
– key idea: use multiplexor to select the output we want
– we can efficiently perform subtraction using two’s complement
– we can replicate a 1-bit ALU to produce a 32-bit ALU
•
Important points about hardware
– all of the gates are always working
– the speed of a gate is affected by the number of inputs to the
gate
– the speed of a circuit is affected by the number of gates in series
(on the “critical path” or the “deepest level of logic”)
•
Our primary focus: comprehension, however,
– Clever changes to organization can improve performance
(similar to using better algorithms in software)
– We saw this in multiplication, let’s look at addition now
2008 전산기개론
14
Problem: ripple carry adder is slow
•
•
Is a 32-bit ALU as fast as a 1-bit ALU?
Is there more than one way to do addition?
– two extremes: ripple carry and sum-of-products
Can you see the ripple? How could you get rid of it?
c1
c2
c3
c4
=
=
=
=
b0c0
b1c1
b2c2
b3c3
+
+
+
+
a0c0
a1c1
a2c2
a3c3
+
+
+
+
a0b0
a1b1c2 =
a2b2
a3b3
c3 =
c4 =
Not feasible! Why?
2008 전산기개론
15
Carry-lookahead adder
•
•
An approach in-between our two extremes
Motivation:
– If we didn't know the value of carry-in, what could we do?
– When would we always generate a carry?
gi = ai bi
– When would we propagate the carry?
pi = ai + bi
•
Did we get rid of the ripple?
c1
c2
c3
c4
=
=
=
=
g0
g1
g2
g3
+
+
+
+
p0c0
p1c1 c2 =
p2c2 c3 =
p3c3 c4 =
Feasible! Why?
2008 전산기개론
16
Use principle to build bigger adders
CarryIn
a0
b0
a1
b1
a2
b2
a3
b3
a4
b4
a5
b5
a6
b6
a7
b7
a8
b8
a9
b9
a10
b10
a11
b11
a12
b12
a13
b13
a14
b14
a15
b15
CarryIn
Result0–3
ALU0
P0
G0
pi
gi
C1
ci + 1
CarryIn
Carry-lookahead unit
Result4–7
ALU1
P1
G1
•
•
•
pi + 1
gi + 1
C2
ci + 2
CarryIn
Can’t build a 16 bit adder this way... (too big)
Could use ripple carry of 4-bit CLA adders
Better: use the CLA principle again!
Result8–11
ALU2
P2
G2
pi + 2
gi + 2
C3
ci + 3
CarryIn
Result12–15
ALU3
P3
G3
pi + 3
gi + 3
C4
ci + 4
CarryOut
2008 전산기개론
17
ALU Summary
•
•
•
•
We can build an ALU to support MIPS addition
Our focus is on comprehension, not performance
Real processors use more sophisticated techniques for arithmetic
Where performance is not critical, hardware description languages
allow designers to completely automate the creation of hardware!
2008 전산기개론
18
Data Path and Control
2008 전산기개론
19
The Processor: Datapath & Control
•
•
지금까지 배운 MIPS 명령어와 앞장에서의 ALU 지식을 바탕으로..
“lw” 와 “add” MIPS 명령어가 어떻게 구현되는지 배운다
•
명령어가 어떻게 수행되는지 개략적으로 살펴보면 :
1. program counter (PC) 를 이용해 수행해야 할 명령어의 메모리 주소를
관리하고
2. 메모리로 부터 명령어를 (CPU) 로 가지고 오고 (fetch)
3. 명령어내의 필드가 지정한 레지스터로 부터 데이터를 읽어오고 (MIPS ISA는
레지스터를 중심으로 작동함을 상기)
4. 명령어 (OP 필드) 에 따라 적절한 다음 동작이 수행
- ALU 는 일단 한 번 거침 ( lw ; 메모리 주소 계산, add ; 계산, beq ; 비교)
•
모든 명령어는 레지스터 읽기가 끝난 후 최소 한번은 ALU를 거쳐야 함을 명심…
2008 전산기개론
20
Data Path의 기본 구조
Data
Register #
PC
Address
Instruction
Instruction
Registers
ALU
Address
Register #
Data
memory
memory
Register #
Data
•
Two types of functional units:
– combinational 회로 : 출력결과 값이 현재 입력값에만 좌우되는 디지털 회로
– Sequential 회로 : 출력결과 값이 현재의 입력값뿐만 아니라 (그) 회로의 현
상태(state)에도 좌우될 때
2008 전산기개론
21
Sequential 회로에서 state가 변하는 때
•
Clocks used in synchronous logic
– when should an element that contains state be updated?
falling edge
cycle time
rising edge
• 위의 클럭 신호가 변할 때/edge (  또는  ) state 가 변한다
• Sequential 회로에서 state란 기억되어 있는 데이터라 생각하면 된다.
2008 전산기개론
22
D flip-flop
•
Output changes only on the clock edge
D
D
C
D
latch
Q
D
Q
D
latch _
C
Q
Q
_
Q
C
D
C
Q
2008 전산기개론
23
레지스터 파일 : MIPS의 32개의 레지스터들을 묶어 쉽게 읽고 쓰게한 구조
•
Built using D flip-flops : 2개의 읽기/read 포트와 1개의 쓰기/write 포트
지원하는 레지스터 파일은 5개의 입력과 2개의 출력을 지닌다.
Read register
number 1
Register 0
Register 1
Register n ? 1
Register n
M
u
x
Read data 1
Read register
number 1
Read
data 1
Read register
number 2
Register file
Write
register
Read register
number 2
M
u
x
Read data 2
Write
data
Read
data 2
Write
< 2개의 읽기포트를 갖는 레지스터 파일 >
2008 전산기개론
24
레지스터 파일 동작
5
5
5
Read register
number 1
Read
data 1
32
1.
“Read register number 1” 포트에 17번
레지스터번호를 입력하면 “Read data 1”
출력 포트에 17번 레지스터에 있는 32비트
값이 튀어나온다.
2.
“Write register” 포트에 21번 레지스터를
지정하고, 동시에 “Write data”에 적어 놓을
데이터를 올려/입력 놓은 상태에서 “Write”
에 assert 신호를 주면 21번 레지스터에
데이터값이 저장된다.
3.
Write가 edge 시에 작동하므로 같은
레지스터에 대해 Read와 Write가 한
클럭사이클에 가능하다
*
MIPS의 R type (add, sub..) 명령문들이 2개의
읽기 레지스터와 1개의 쓰기 레지스터가
필요하므로 옆과 같은 레지스터 파일이
유용하다.
Read register
number 2
Register file
Write
register
Read
data 2
32 Write
data
32
Write
2008 전산기개론
25
앞으로 많이 보게 될 data path 구성 요소들
Instruction
address
PC
Instruction
Add Sum
MemWrite
Instruction
memory
Address
a. Instruction memory
5
Register
numbers
5
5
Data
b. Programcounter
Read
register 2
Registers
Write
register
Write
data
c. Adder
3
Read
register 1
Read
data
ALU control
Write
data
Data
memory
16
Sign
extend
32
Read
data 1
Data
Zero
ALU ALU
result
MemRead
a. Data memory unit
b. Sign-extension unit
Read
data 2
RegWrite
a. Registers
b. ALU
2008 전산기개론
26
R 타입 (add 등) 명령문에 필요한 (기본) 구조
Registers
PC
Read
address
Instruction
Instruction
memory
Read
register 1
Read
Read
data 1
register 2
Write
register
Read
data 2
3
ALU opera
Zero
ALU ALU
result
Write
data
RegWrite
1.
PC가 가르키는 명령문 (add)를 Instruction memory에서 읽는다
2.
Add 명령문에 있는 2개의 소스 레지스터 오페런드를 레지스터 파일에서 읽어 그 결과를 “read data”
출력 포트에 띄운다
3.
레지스터에 있던 데이터값 2개가 ALU에서 더해져 그 결과가 Write data로 들어가고, RegWrite가 assert
되면 “Write register” 에서 지정한 레지스터에 결과가 쓰여진다.
2008 전산기개론
27
lw, sw 명령문에 필요한 (기본) 구조
PC
Read
address
Instruction
Instruction
memory
Read
register 1 Read
Read
data 1
register 2
Write
Read
register
data 2
Write
data
RegWrite
16 Sign 32
extend
3
ALU operation
Zero
ALU ALU
result
MemWrite
Read
data
Data
Write memory
data
Address
MemRead
1.
명령문이 “lw $t1 offset_value($t2)” 일때,
2.
$t2 에 있는 레지스터 값을 읽는 동시에 16 bit offset_value 값을 32 bit 값으로 전환하여 앞의 레지스터
값과 더하여 메모리에서 가지고 와야 할 데이터의 주소값을 구한다.
3.
읽기/fetch 해야할 메모리 주소값이 구해지면, data memory 의 address 입력포트에 주소값이 입력되며,
MemRead 신호가 assert 되면 Readdata 출력포트에 해당 메모리 주소위치의 데이터값이 튀어 나온다.
4.
이 데이터 값이 빙글 돌아 다시 레지스터파일의 Writedata 입력으로 들어오며, 이 때 Writeregister에 $t1
이 있어 쓰기대상 레지스터번호가 지정되고 RegWrite가 assert되면 $t1 레지스터에 Data memory에서
읽은 데이터가 쓰여진다.
2008 전산기개론
28
lw, sw, ALU, beq 를 할 수 있는 구조
beq $S1, $S2, 25 ; if ($S1 == $S2) go to PC+4+25*4
•
멀티플렉서를 명령문에 따라 적절히 동작하는 스위치로 활용한다.
PCSrc
M
u
x
Add
Add ALU
result
4
*4
Shift
left 2
Registers
PC
Read
address
Instruction
Instruction
memory
Read
register 1
Read
Read
data 1
register 2
Write
register
Write
data
RegWrite
16
ALUSrc
Read
data 2
Sign
extend
M
u
x
3
ALU operation
Zero
ALU ALU
result
MemWrite
MemtoReg
Address
Read
data
Data
memory
Write
data
M
u
x
32
MemRead
2008 전산기개론
29
Control
•
Selecting the operations to perform (ALU, read/write, etc.)
•
Controlling the flow of data (multiplexor inputs)
•
Information comes from the 32 bits of the instruction
•
Example:
add $8, $17, $18
•
Instruction Format:
000000
10001
10010
01000
op
rs
rt
rd
00000 100000
shamt
funct
ALU's operation based on instruction type and function code
2008 전산기개론
30
Data Path and Control-2
2008 전산기개론
31
Five Execution Steps
•
Instruction Fetch
•
Instruction Decode and Register Fetch
•
Execution, Memory Address Computation, or Branch Completion
•
Memory Access or R-type instruction completion
•
Write-back step
INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!
2008 전산기개론
32
Step 1: Instruction Fetch
•
•
•
Use PC to get instruction and put it in the Instruction Register.
Increment the PC by 4 and put the result back in the PC.
Can be described succinctly using RTL "Register-Transfer Language"
IR = Memory[PC];
PC = PC + 4;
Can we figure out the values of the control signals?
What is the advantage of updating the PC now?
2008 전산기개론
33
Step 2: Instruction Decode and Register Fetch
•
•
•
Read registers rs and rt in case we need them
Compute the branch address in case the instruction is a branch
RTL:
A = Reg[IR[25-21]];
B = Reg[IR[20-16]];
ALUOut = PC + (sign-extend(IR[15-0]) << 2);
•
We aren't setting any control lines based on the instruction type
(we are busy "decoding" it in our control logic)
2008 전산기개론
34
Step 3 (instruction dependent)
•
ALU is performing one of three functions, based on instruction type
•
Memory Reference:
ALUOut = A + sign-extend(IR[15-0]);
•
R-type:
ALUOut = A op B;
•
Branch:
if (A==B) PC = ALUOut;
2008 전산기개론
35
Step 4 (R-type or memory-access)
•
Loads and stores access memory
MDR = Memory[ALUOut];
or
Memory[ALUOut] = B;
•
R-type instructions finish
Reg[IR[15-11]] = ALUOut;
The write actually takes place at the end of the cycle on the edge
2008 전산기개론
36
Write-back step
• Reg[IR[20-16]]= MDR;
What about all the other instructions?
2008 전산기개론
37
Control : 언제 어떤 신호를 assert 시키는가?
0
M
u
x
Add
Add
4
Instruction [31– 26]
Control
Instruction [25– 21]
PC
Read
address
Instruction
memory
Instruction [15– 11]
1
Shift
left 2
RegDst
Branch
MemRead
MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite
Read
register 1
Instruction [20– 16]
Instruction
[31– 0]
ALU
result
0
M
u
x
1
Read
data 1
Read
register 2
Registers Read
Write
data 2
register
0
M
u
x
1
Write
data
Zero
ALU ALU
result
Address
Write
data
Instruction [15– 0]
16
Sign
extend
Read
data
Data
memory
1
M
u
x
0
32
ALU
control
Instruction [5– 0]
Memto- Reg Mem Mem
Instruction RegDst ALUSrc
Reg
Write Read Write Branch ALUOp1 ALUp0
R-format
1
0
0
1
0
0
0
1
0
lw
0
1
1
1
1
0
0
0
0
sw
X
1
X
0
0
1
0
0
0
beq
X
0
X
0
0
0
1
0
1
2008 전산기개론
38
Summary:
Step name
Instruction fetch
Action for R-type
instructions
Instruction
decode/register fetch
Action for memory-reference
Action for
instructions
branches
IR = Memory[PC]
PC = PC + 4
A = Reg [IR[25-21]]
B = Reg [IR[20-16]]
ALUOut = PC + (sign-extend (IR[15-0]) << 2)
Execution, address
computation, branch/
jump completion
ALUOut = A op B
ALUOut = A + sign-extend
(IR[15-0])
Memory access or R-type
completion
Reg [IR[15-11]] =
ALUOut
Load: MDR = Memory[ALUOut]
or
Store: Memory [ALUOut] = B
Memory read completion
if (A ==B) then
PC = ALUOut
Action for
jumps
PC = PC [31-28] II
(IR[25-0]<<2)
Load: Reg[IR[20-16]] = MDR
2008 전산기개론
39
제어 회로는 어떻게 만들어 지는지..
•
제어신호(control signal)의 값은 다음에 따라 결정된다:
– 어떤 instruction이 실행되고 있는지
– Instruction의 어떤 단계가 실행 중에 있는지
•
어떤 시기에 어떤 신호가 발생해야 하는 지 책에 있는 FSM (Finite State
Machine) 을 그려보아..
– PLA
– ROM
– microprogramming
으로 구현한다.
2008 전산기개론
40
ROM Implementation
•
•
ROM = "Read Only Memory"
– values of memory locations are fixed ahead of time
A ROM can be used to implement a truth table
– if the address is m-bits, we can address 2m entries in the ROM.
– our outputs are the bits of data that the address points to.
m
n
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
0
1
1
1
0
0
0
0
0
1
1
0
0
0
1
1
1
0
0
0
0
0
1
1
1
0
0
0
0
1
0
1
m is the "heigth", and n is the "width"
2008 전산기개론
41
ROM Implementation
•
•
How many inputs are there?
6 bits for opcode, 4 bits for state = 10 address lines
(i.e., 210 = 1024 different addresses)
How many outputs are there?
16 datapath-control outputs, 4 state bits = 20 outputs
•
ROM is 210 x 20 = 20K bits
•
Rather wasteful, since for lots of the entries, the outputs are the
same
?i.e., opcode is often ignored
(and a rather unusual size)
2008 전산기개론
42
Microprogramming
Control unit
Microcode memory
Outputs
Input
PCWrite
PCWriteCond
IorD
MemRead
MemWrite
IRWrite
BWrite
MemtoReg
PCSource
ALUOp
ALUSrcB
ALUSrcA
RegWrite
RegDst
AddrCtl
Datapath
1
Microprogram counter
Adder
Op[5– 0]
Address select logic
Instruction register
opcode field
•
What are the microinstructions?
2008 전산기개론
43
Microcode: Trade-offs
•
Distinction between specification and implementation is sometimes blurred
•
Specification Advantages:
– Easy to design and write
– Design architecture and microcode in parallel
•
Implementation (off-chip ROM) Advantages
– Easy to change since values are in memory
– Can emulate other architectures
– Can make use of internal registers
•
Implementation Disadvantages, SLOWER now that:
– Control is implemented on same chip as processor
– ROM is no longer faster than RAM
– No need to go back and make changes
2008 전산기개론
44