Transcript PPT

Computer Organization
Chapter 4
Prof. Qi Tian
Fall 2013
1
Topics
• Dec. 6 (Friday)
– Final Exam Review
– Record Check
• Dec. 4 (Wednesday)
– 5 variable Karnaugh Map
– Quiz 5
• Dec. 2 (Monday)
– 3, 4 variables Karnaugh Map
– Reminder:
•
•
•
•
Assignment 6 is due (extended) on Wednesday Dec 4.
Last quiz on Wednesday Dec 4
Final exam review on Friday.
Course evaluation on ASAP by Dec 2.
2
Topics
• Nov. 27 (Wednesday)
– Minimum sum-of-product solution
– 2 variable Karnaugh map
• Nov. 25 (Monday)
– Truth Table
– Minterm and Maxterm
• Nov. 22 (Friday)
– Practice Problems 4.2
– Digital Logic Design
• Function Complete
3
Topics
• Nov. 20 (Wednesday)
– Midterm Exam Two
– Practice Problems 4.1
• Nov. 18 (Monday)
– Guest Lecture by Prof. Dakai Zhu
• Nov. 13 (Wednesday)
– Y86 Instruction Set
– Slides 1-13
4
Section 4.1 The Y86 Instruction Set Architecture
• We will look at an assembly language set Y86
– Simpler than IA32 but similar to it
• Compared to IA32, Y86 has fewer data types,
instructions, and addressing modes.
• Y86 is inspired by IA32 instruction set, which is
colloquially referred to as “x86”
– Understand how it is encoded, and how you
would build hardware to implement it.
5
Section 4.1.1 Programmer-Visible State
RF: Program Registers
%eax
%esi
%ecx
%edi
%edx
%esp
%ebx
%ebp
CC:
Condition
codes
ZF
SF
OF
Stat: Program status
DMEM: Memory
PC
The Y86
• 8 32-bit registers with the same names as the IA32 32-bit registers
• 3 condition codes: ZF, SF, OF (not carry flag – interpret integers as signed)
• A program counter (PC)
• A program status byte: AOK, HLT, ADR, INS
• Memory: up to 4 GB to hold program and data
The Y86 does not have
• A carry flag
• Floating point registers
6
Section 4.1.1 Programmer-Visible State
RF: Program Registers
%eax
%esi
%ecx
%edi
%edx
%esp
%ebx
%ebp
CC:
Condition
codes
ZF
SF
OF
Stat: Program status
DMEM: Memory
PC
• Register %esp is used as stack pointer by the push, pop, call and return
instructions.
• Other registers do not have fixed meanings or values.
• Single-bit condition codes: ZF, SF, OF, storing information about the effect of
the most recent arithmetic or logical instructions.
• The program counter (PC) holds the address of the instruction currently being
executed.
• Memory is conceptually a large array of bytes, holding both program and data.
• Status code: Stat, indicating the overall state of program execution. It will indicate
either normal operation, or that some sort of exception has occurred.
7
Section 4.1.2 Y86 instruction
Y86 instruction set
• Instruction encodings range
between 1 and 6 bytes
• An instruction consists of
— an 1-byte instruction
specifier
— Possibly a 1-byte
register specifier
— Possibly a 4-byte
constant word
• Field fn specifies a
particular integer operation
(OP1), data movement
condition (cmovXX), or
branch condition (jXX).
• A numeric values are
shown in hexadecimal.
8
Section 4.1.3 Instruction Encoding
• rA or rB represent one of the registers, encoded as follows:
Number
0
1
2
3
4
5
6
7
F
Register Name
%eax
%ecx
%edx
%ebx
%esp
%ebp
%esi
%edi
No register
• Different opcodes for 4 types of moves:
o (rr) Register to register
o (ir) immediate to register
o (rm) register to memory
o (mr) memory to register
9
Section 4.1.3 Instruction Encoding
• The only memory addressing mode is base register + displacement
— No second register and scaling factor
• Memory operations always move 4 bytes (no byte or 2 bytes word
memory operations
• Source or destination of memory move must be a register.
• The operations supported (OP1) are:
fn operation
0 addl
1 subl
2 andl
3 xorl
• Only 32-bit operations and no or and no not.
• These only take registers as operands and only work on 32bits.
10
Section 4.1.3 Instruction Encoding
• 7 jumps instructions:
fn jump
0 jmp
1 jle
2 jl
3 je
4 jne
5 jge
6 jg
• 6 conditional move instructions with encodings similar to the conditional
jump instructions.
— Similar to the IA32
— Note that rrmovl is a special case.
• You can tell the type of instruction and how many bytes it has by looking at
the first byte of the instruction.
11
Figure 4.3. Function codes for Y86 instruction set
Operations
addl 6 0
subl
6 1
andl 6 2
xorl
6 3
Moves
Branches
jmp
7 0
jne
7 4
rrmovl 2 0
cmovne 2 4
jle
7 1
jge
7 5
cmovle 2 1
cmovge
2 5
jl
7 2
jg
7 6
cmovl 2 2
cmovg
2 6
je
7 3
cmove
2 3
• The code specifies a particular integer operation, branch condition, or data
transfer condition.
• These instructions are shown as OP1, jXX, and cmovXX in Figure 4.2
12
Summary of Section 4.1.2-4.1.3: Y86 instruction set
Number
0
1
2
3
4
5
6
7
F
fn
0
1
2
3
4
5
6
Register Name
%eax
%ecx
%edx
%ebx
%esp
%ebp
%esi
%edi
No register
jump
jmp
jle
jl
je
jne
jge
jg
7 jump functions
fn
0
1
2
3
Program register identifiers
operation
addl
subl
andl
xorl
Operations supported
Branches
Operations
Moves
addl
6 0
jmp
jne
rrmovl
2 0
cmovne
2 4
subl
6 1
jle
jge
cmovle
2 1
cmovge
2 5
andl
6 2
jl
jg
cmovl
2 2
cmovg
2 6
xorl
6 3
je
cmove
2 3
Section 4.1.2 Y86 instruction
• Y86 is largely a subset of the IA32 instruction
set.
• Include only 4-byte integer operations, has
fewer addressing modes, and includes a
smaller set of operations.
• Since we only use 4-byte data, we can refer to
these as “words” without ambiguity.
14
Instruction Encoding Examples
1.
rrmovl %eax, %ecx
The encodings are: 2001
This would be stored in 2 bytes of memory, the first
containing 0x20 and the second containing 0x01.
2.
rmmovl %ecx, 24(%ebp)
The encodings are: 401524000000
The first two bytes are 4015 and the displacement is 0x24.
On a little endian machine the next byte would be 0x24
followed by 3 bytes of 0.
15
Practice Problem 4.1
Determine the byte encoding of the Y86 instruction sequences that follows. The line
“.pos 0x100” indicates that the starting address of the object code should be 0x100.
.pos 0x100 # start code at address 0x100
irmovl $15, %ebx
# load 15 into %ebx
rrmovl %ebx, %ecx
# copy 15 to %ecx
loop:
# loop
rmmovl %ecx, -3(%ebx) # save %ecx at address 15-3=12
addl
%ebx, %ecx
# increment %ecx by 15
jmp loop
# Goto loop
16
Practice Problem 4.1 - Solution
Determine the byte encoding of the Y86 instruction sequences that follows. The line
“.pos 0x100” indicates that the starting address of the object code should be 0x100.
.pos 0x100 # start code at address 0x100
irmovl $15, %ebx
# load 15 into %ebx
0x100: 30f30f000000
rrmovl %ebx, %ecx
# copy 15 to %ecx
0x106: 2031
loop:
# loop
0x108:
rmmovl %ecx, -3(%ebx) # save %ecx at address 15-3=12 0x108: 4013fdffffff
addl
%ebx, %ecx
# increment %ecx by 15
0x10e: 6031
jmp loop
# Goto loop
0x110: 7008010000
17
Practice Problem 4.2
For each byte sequence listed, determine the Y86 instruction sequences it encodes. If there is
some invalid byte in the sequence, show the instruction sequence up to that point and indicate
where the invalid value occurs. For each sequence, we show that the starting address, then a
colon, and then the byte sequence.
A.
0x100: 30f3fcffffff40630008000000
B.
0x200: a06f80080200000030f30a00000090
18
Practice Problem 4.2 - Solution
For each byte sequence listed, determine the Y86 instruction sequences it encodes. If there is
some invalid byte in the sequence, show the instruction sequence up to that point and indicate
where the invalid value occurs. For each sequence, we show that the starting address, then a
colon, and then the byte sequence.
A.
0x100: 30f3fcffffff40630008000000
Note: -4 = fffffffc
B.
0x200: a06f80080200000030f30a00000090
0x100: irmovl $-4, %ebx
0x106: rmmovl %esi, 0x800(%ebx)
0x10c: halt
0x200: pushl %esi
0x202: call proc
0x207: halt
0x208: proc
0x208: irmovl $10, %ebx
0x20e: ret
19
Y86 vs IA32
•
•
•
Encodings of the Y86 are simpler than the IA32, but not as compact.
IA 32 is sometimes labeled as CISC and is deemed to be the opposite of RISC.
RISC and CISC
– RISC = reduced instruction set computer
– CISC = complex instruction set computer
– Basic ideas of RISC
•
•
•
•
•
•
Small number of instructions
Most instructions have the same length
Simple addressing formats
Arithmetic and logical operations only work on registers
Memory operations only move between register and memory
No condition codes: test instructions store results in registers.
– Long controversy between RISC and CISC since 1980’s (Read textbook pp. 342-344)
– Which is better?
Answer: A combination
– Which is Y86? It includes both RISC and CISC
•
•
•
On the CISC side, it has conditional codes, variable-length instructions, and stack-intensive procedure
linkages.
On the RISC side, it uses a load-store architecture and a regular encoding.
Taking IA32 and simplifying it by applying the principle of RISC.
20
Section 4.1.4 Y86 Exceptions
• What happens when an invalid assembly instruction
is found?
– This generates an exception.
– In Y86 an exception halts the machine, it stops
executing.
– What are some possible causes of exceptions?
•
•
•
•
•
Invalid operation
Divide by 0
Sqrt of negative number
Memory access error (e.g., address too large)
Hardware error
21
Section 4.1.4 Y86 Exceptions
Value
Name
Meaning
1
AOK
Normal operation
2
HLT
Halt instruction encountered
3
ADR
Invalid address encountered
4
INS
Invalid instruction encountered
Y86 status codes. In our design, the processor halts for any code
other than AOK
22
Y86 Examples
•
Example 1:
– IA32
addl (%ecx), %eax
– Y86:
• Cannot be finished in one instruction
• 2 instructions to implement:
mrmovl (%ecx), %esi
addl
%esi, %eax
•
Example 2:
– IA 32: addl $4, %ecx
– Y86: irmovl $4, %ebx
addl %ebx, %ecx
•
Example 3:
– IA 32: addl (%ebx, %edx, 4), %eax
– Y86:
How many Y86 instructions are needed to do this?
23
Section 4.2 Logic Design
• Section 4.2.1 Logic gates
AND
OR
NOT
– Logic gate:
• simplest building block, 1-2 inputs and 1 output;
• Boolean function such as AND, OR, and NOT
• Hardware Description Language (HDL)
– Currently, circuits are designed using a HDL.
– Much like a C code: for example, an AND gate is represented by
a && b
24
Section 4.2.2 Combinational Circuits
• Combinational Circuits
– No memory vs. clocked sequential circuits, has
memory
– Building blocks: logic gates
– Design an economic circuit
• Algebraic methods for simplication
• Karnaugh maps
Alternative way
25
Section 4.2.2 Combinational Circuits
• Example: bit equal
1) bool eq = (a&&b) || (!a && !b)
Alternative way
26
Section 4.2.2 Combinational Circuits
• A block diagram:
• We can make a multi-bit equal out of 1-bit equals
• Here is a block diagram
27
Example: 1-bit multiplexer
• It allows you to select one of two one-bit inputs (data
selector) and is described by:
bool out = (s && a) || (!s && b)
• Here is a block diagram
s = 1, out = a;
s = 0, out = b;
28
Example: a multi-bit multiplexer
• We can make a multi-bit (word level) mux out of 1-bit muxes:
HCL descriptions of
the mux:
Int Out = [
s: A;
1: B;
];
[…] is like a select, it means if s is true, the
result is A. Otherwise, we check the next
case. 1 is always true, so we select B.
29
Example: 4-word MUX
• Here is a 4 word mux (4-way mux)
HCL description:
int Out4 = [
!s1 && !s0 :
!s1
:
!s0
:
1
:
];
A;
B;
C;
D;
s1s0
00
01
10
11
out
A
B
C
D
Question: How many control inputs would be needed for a 7-way mux?
30
Other Gates and Basic Building Blocks
• XOR gate:
Out = a^b = !a && b || a && !b
31
Function Complete
• Function complete:
– Any circuits can be made from and, or, and not gates can also be
made just using and and not gates; or or and not gates
– Because: a || b = ! (!a && !b); a && b = ! (!a || !b)
– The function complete sets: (and, or, not), (and, or), (or, not)
– Any single gates can be used as functionally complete sets?
Ans: Yes, they are NAND () gate and NOR (|) gate.
Questions:
Prove {}, and {|} are function complete.
32
Function Complete
• Proof of functional complete for NAND {}
• Proof of functional complete for NOR {|}
33
Adders
• 1-bit Half Adder: 2 inputs (A, B) and 2 outputs (S, C)
Truth Table
A
B
C
S
Note
0
0
0
0
0+0=1
0
1
0
1
0+1=1
1
0
0
1
1+0=1
1
1
1
0
1+1=2
S = A^B
C = A&&B
34
1-bit Full Adders
• 1-bit Full Adder: 3 inputs (A,B,Cin) and 2 outputs (S, Cout)
Truth Table
A
B
Cin
Cout
S
0
0
0
0
0
0
0
1
0
1
0
1
0
0
1
0
1
1
1
0
1
0
0
0
1
1
0
1
1
0
1
1
0
1
0
1
1
1
1
1
35
1-bit Full Adder
• Assignment 6
A
B
A
B
A
B
36
Class Notes
Topics:
• Minterm mi
• Maxterm Mi
• Standard sum-of-product
• Standard product-of-sum
• Karnaugh-Map
– Minimum sum-of-product
– Minimum product-of-sum
Note: See class notes in the course web page under
Resources Link.
37
Karnaugh Maps
• Design:
– Start from Truth table => Karnaugh Maps =>
Boolean expressions
• Kaunaugh Map
– Useful tool for simplifying and manipulating
switching functions of three or four variables.
– Similar to truth tables, but in different
representation.
38
4-bit Full Adder
• 4-bit full adder which takes as input two 4-bit
number and a carry coming in and produce a 5
bits of output.
– Input: 9 bits
– Output: 5 bits
• How to design it?
– Using Truth table? How big is it?
– Not efficient for Kaunaugh-Map.
39
4-bit Full Adder
• 4-bit full adder which takes as input two 4-bit
number and a carry coming in and produce a 5
bits of output.
• Can be designed in a cascade way!
40
A little more about Logic Design
• Propagation Delay
– Real gates are made from transistors, voltages are used to represent
Boolean values true (1) and false (0)
– A voltage greater than a true-threshold is true, and a voltage less than
false-threshold is false.
– Voltages between these two thresholds give undefined results.
– When you change the input from high to low, it takes some time,
called the propagation delay, or gate delay, for the output voltage to
reach its correct value.
– Propagation delay determines how fast your CPU can run.
41
ALU (Arithmetic Logic Unit)
• An ALU is a circuit that can produce one of several arithmetic
(add, subtract, etc.) or logical (and, or, etc.) functions.
Block diagram of this ALU
ALU Design
42
Section 4.2.5 Memory and Clocking
• So far, we have talked about combinational
circuits
• Clocked Sequential Circuits:
– Has memory; clock input
– Flip-Flops
• S-R Flip-Flop, D Flip-flop, J-K Flip-Flop, T Flip-Flop,
edge-triggered D Flip-Flop and the building
block of a multi-bit register
43
Section 4.3.1 Organizing Processing
Steps into Stages
• SEQ: a “sequential” processor
• Processing an instruction involves a number of
operations, and we organize them in a particular
sequence of stages, attempting to make all
instructions follow a uniform sequence.
– Design a processor that makes best use of the
hardware.
44
SEQ Hardware Structure
• The computations required to
implement all of the Y86 instructions
can be organized into six basic stages:
fetch, decode, execute, memory,
write back, and PC update.
• See Figure 4.22 for a better quality.
45
An Informal Description
•
Fetch
–
•
Decode
–
–
•
Read the instruction into memory using the address in the PC.
If possible, read the values from the register file and set valA and valB.
The registers are specified by rA and rB except for push and pop which use %esp in
place of rB.
Execute
–
–
What it does depends on the icode.
Some instructions feed values into the ALU to obtain a valE and possibly set the
condition codes.
e.g., OP1, rmmovl, mrmovl
– Some instructions will check the condition codes and change the valP.
•
Memory
– May read from or write to memory
•
Write back
–
–
•
May write up to two values to the register file.
Pop will update both the stack pointer and the register popped into.
PC Update
–
PC is set valP.
46
Sample Y86 instruction sequence
Stage
Fetch
OP1 rA, rB
icode:ifun M1[PC]
rA:rB  M1[PC+1]
valP  PC +2
rrmovl rA, rB
icode:ifun M1[PC]
rA:rB  M1[PC+1]
valP  PC +2
Decode
valA  R[rA]
valB  R[rB]
valE  valB OP valA
Set CC
valA  R[rA]
valE  0 + valA
valE  0 + valC
R[rB]  valE
PC  valP
R[rB]  valE
PC  valP
R[rB]  valE
PC  valP
Execute
Memory
Write back
PC update
irmovl V, rB
icode:ifun M1[PC]
rA:rB  M1[PC+1]
valC  M4[PC+2]
valP  PC +6
Figure 4. 18 Computations in sequential implementation of Y86 instruction OP1,
rrmovl, irmovl.
OP1: integer and logical operations; rrmovl (register-to-register move) and
irmovl (immediate-to-register move)
47
Sample Y86 instruction sequence
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
0x000:
0x006:
0x00c:
0x00e:
0x014:
0x01a:
0x01c:
0x01e:
0x023:
0x028:
0x028:
0x029:
0x029:
30f209000000
30f315000000
6123
30f480000000
404364000000
a02f
b00f
7328000000
8029000000
00
90
| irmovl $9, %edx
| irmovl $21, %ebx
|
subl %edx, %ebx
|
irmovl $128, %esp
|
rmmovl %esp, 100(%ebx)
|
pushl %edx
|
popl %eax
|
je done
|
call proc
| done:
|
halt
| proc:
|
ret
Questions: We will trace the processing of these instructions.
48
Practice Problem
•
Fill-in the right-hand column of the following table to describe the processing of the irmovl
instruction online 4 of the object code in previous slide.
Stage
Generic
irmovl V, rB
Fetch
icode:ifun M1[PC]
rA:rB  M1[PC+1]
valC  M4[PC+2]
valP  PC +6
Specific
irmovl $128, %esp
Decode
Execute
valE  0 + valC
Memory
Write back
R[rB]  valE
PC update
PC  valP
49
Practice Problem - solution
•
Fill-in the right-hand column of the following table to describe the processing of the irmovl
instruction on line 4 of the object code in previous slide.
Stage
Generic
irmovl V, rB
Specific
irmovl $128, %esp
Fetch
icode:ifun M1[PC]
rA:rB  M1[PC+1]
valC  M4[PC+2]
valP  PC +6
icode:ifun M1[0x00e]=3:0
rA:rB  M1[0x00f]=f:4
valC  M4[PC+2]=128
valP  PC +6 = 0x014
valE  0 + valC
valE  0 + valC = 0 + 128=128
Write back
R[rB]  valE
R[rB]  128
PC update
PC  valP
PC  0x14
Decode
Execute
Memory
50
Sample Y86 instruction sequence
Stage
Fetch
Decode
Execute
Memory
Write back
PC update
rmmovl rA, D(rB)
icode:ifun M1[PC]
rA:rB  M1[PC+1]
valC  M4[PC+2]
valP  PC +6
valA  R[rA]
valB  R[rB]
valE  valB + valC
mrmovl D(rB), rA
icode:ifun M1[PC]
rA:rB  M1[PC+1]
valC  M4[PC+2]
valP  PC +6
M4[valE]  valA
valM  M4[valE]
R[rA]  valM
PC  valP
PC  valP
valB  R[rB]
valE  valB + valC
Figure 4. 19 Computations in sequential implementation of Y86 instruction
rmmovl, mrmovl. These instructions read or write memory.
51
Sample Y86 instruction sequence
Stage
Fetch
pushl rA
icode:ifun M1[PC]
rA:rB  M1[PC+1]
pop1 rA
icode:ifun M1[PC]
rA:rB  M1[PC+1]
Execute
valP  PC + 2
valA  R[rA]
valB  R[%esp]
valE  valB + (-4)
valP  PC + 2
valA  R[%esp]
valB  R[%esp]
valE  valB + 4
Memory
Write back
M4[valE]  valA
R[%esp]  valE
valM  M4[valA]
R[%esp]  valE
R[rA]  valM
PC update
PC  valP
PC  valP
Decode
Figure 4. 20 Computations in sequential implementation of Y86 instruction
pushl, popl. These instructions push and pop the stack.
52
Sample Y86 instruction sequence
Stage
Fetch
jXX Dest
icode:ifun M1[PC]
Call Dest
Icode:ifun M1[PC]
valC  M4[PC+1]
valP  PC + 5
valC  M4[PC+1]
valP  PC + 5
valB  R[%esp]
valE  valB + (-4)
valP  PC + 1
valA  R[%esp]
valB  R[%esp]
valE  valB + 4
M4[valE]  valP
R[%esp]  valE
valM  M4[valA]
R[%esp]  valE
PC  valC
PC  valM
Decode
Execute
ret
icode:ifun M1[PC]
Cnd  Cond(CC, ifun)
Memory
Write back
PC update
PC  Cnd? valC: valP
Figure 4. 21 Computations in sequential implementation of Y86 instruction jXX,
call, ret. These instructions cause control transfers.
53