Document

Transcript Document

Multicycle Datapath Design
CMSC411/Computer Architecture
These slides and all associated material are
© 2003 by J. Six and are available only for
students enrolled in CMSC411.
I once put instant coffee in a microwave and went back in time.
CMSC411 – Computer Architecture / © 2003 J. Six
Use and Distribution Notice
Possession of any of these files implies understanding and
agreement to this policy.
The slides are provided for the use of students enrolled in Jeff
Six's Computer Architecture class (CMSC 411) at the University of
Maryland Baltimore County. They are the creation of Mr. Six and
he reserves all rights as to the slides. These slides are not to be
modified or redistributed in any way. All of these slides may only
be used by students for the purpose of reviewing the material
covered in lecture. Any other use, including but not limited to, the
modification of any slides or the sale of any slides or material, in
whole or in part, is expressly prohibited.
Most of the material in these slides, including the examples, is
derived from Computer Organization and Design, Second Edition.
Credit is hereby given to the authors of this textbook for much of
the content. This content is used here for the purpose of
presenting this material in CMSC 411, which uses this textbook.
CMSC411 – Computer Architecture / © 2003 J. Six
Multicycle Implementations
Before we broke each instruction into a series of
steps that correspond to the functional units that
were necessary.
In a multicycle implementation, each step in the
execution of an instruction takes one clock cycle.
This allows a functional unit to be used more
than once per instruction. We only require that
each unit is only used once per clock cycle.
This allows instructions to take different numbers
of cycles and functional units can be shared –
this turns out to be a major advantage.
CMSC411 – Computer Architecture / © 2003 J. Six
The Multicycle Datapath
Here is an abstract representation of a
multicycle datapath.
Instruction
register
PC
Address
Data
A
Register #
Instruction
Memory
or data
Data
Memory
data
register
ALU
Registers
Register #
B
Register #
ALUOut
CMSC411 – Computer Architecture / © 2003 J. Six
The Multicycle Datapath
This multicycle datapath has a couple of
key differences…



A single memory unit is used instead of separate
units for instructions and data.
The ALU and two adders have been replaced
with a single ALU.
After every major functional unit, we have added
registers. These registers hold the output of that
unit until the value is used in the next clock
cycle. This is key to the reuse capability of the
functional units in the datapath.
CMSC411 – Computer Architecture / © 2003 J. Six
Data Storage
Data that is used by later instructions must be
stored in a programmer-accessible state
element. This includes the register file, the PC,
and the memory.
Data that is used later in the same instruction
must be stored in one of these additional
registers that appear after each functional unit.
These include…



The instruction register (IR) and memory data
register (MDR) save the output of a memory access.
The A and B registers hold the output of reads from
the register file.
The ALUOut register holds the output of the ALU.
CMSC411 – Computer Architecture / © 2003 J. Six
Sharing Functional Units
Since we are now reusing functional units, we
need multiplexors in front of our function units to
select the appropriate inputs for the current use.

For example, a multiplexor is needed in front of the
memory address input for the memory unit to select
between the PC (for an instruction access) and the
output of the ALU (for a data access).
We end up needing more two changes…


The first input into the ALU needs a mux (the A
register or the PC).
The second ALU’s mux expands from two inputs into
four (we need to add the constant 4 – for PC
incrementing – and the sign-extended and shifted
offset field – for branch address calculation).
CMSC411 – Computer Architecture / © 2003 J. Six
The New Datapath
Here is the datapath with all of these changes.
Notice that by adding some registers and some
multiplexors, we reduce the number of memory
units to one and eliminate two adders.
PC
0
M
u
x
1
Address
Memory
Instruction
[25– 21]
Read
register 1
Instruction
[20– 16]
Read
Read
register 2 data 1
Registers
Write
Read
register
data 2
MemData
Instruction
[15– 0]
Write
data
Instruction
register
Instruction
[15– 0]
Memory
data
register
0
M
Instruction u
x
[15– 11]
1
A
B
0
4
0
M
u
x
1
Sign
extend
32
Zero
ALU
Write
data
16
0
M
u
x
1
Shift
left 2
1 M
u
2 x
3
ALU
result
ALUOut
CMSC411 – Computer Architecture / © 2003 J. Six
Control Signals
We have changed the way our datapath
works; it needs new control signals.




The programmer-visible state units need a write
signal, as done the IR (that needs to hold the
address until the end of the instruction – we do
not want to write to it all the time like the other
internal registers we added).
The memory needs a read signal.
The ALU control signals stay the same (lucky
us).
The newly added multiplexors need control
signals.
CMSC411 – Computer Architecture / © 2003 J. Six
The New Datapath
(with control signals)
Here’s the datapath with the control signals…
IorD
PC
0
M
u
x
1
MemRead MemWrite
RegDst
RegWrite
Instruction
[25– 21]
Address
Memory
MemData
Write
data
IRWrite
Instruction
register
Instruction
[15– 0]
Memory
data
register
0
M
u
x
1
Read
register 1
Read
Read
data 1
register 2
Registers
Write
Read
register
data 2
Instruction
[20– 16]
Instruction
[15– 0]
ALUSrcA
0
M
Instruction u
x
[15– 11]
1
A
B
4
Write
data
0
M
u
x
1
16
Sign
extend
32
Shift
left 2
Zero
ALU ALU
result
0
1 M
u
2 x
3
ALU
control
Instruction [5– 0]
MemtoReg
ALUSrcB ALUOp
ALUOut
CMSC411 – Computer Architecture / © 2003 J. Six
Adding Branches and Jumps
Our new datapath does not yet support branch and
jump instructions.
There are now three possible PC values…



The output of the ALU. This is the value PC+4 during the
instruction fetch stage.
The ALUOut register. This is the address of a branch target,
after it is computed.
The lower 26 bits of the IR shifted left by two and
concatenated with the upper 4 bits of the incremented PC.
This is for a jump instruction.
Note that the PC is written unconditionally (normal
increment and jumps) or conditionally (conditional
branch). We need two new control signals, PCWrite
and PCWriteCond.
CMSC411 – Computer Architecture / © 2003 J. Six
The PC Write Control Signal
We can compute a PC write control
signal using these two new control
signals, the ALU Zero signal, and a
couple gates.
We can AND the Zero signal with
PCWriteCond and then OR that with
PCWrite. This will give us the PC write
control signal.
CMSC411 – Computer Architecture / © 2003 J. Six
The Complete
Multicycle Datapath
PCWriteCond
PCSource
PCWrite
ALUOp
IorD Outputs
ALUSrcB
MemRead
ALUSrcA
Control
MemWrite
RegWrite
MemtoReg
IRWrite
Op
[5– 0]
RegDst
0
M
26
Instruction [25– 0]
PC
0
M
u
x
1
Shift
left 2
Instruction
[31-26]
Address
Memory
MemData
Write
data
Instruction
[25– 21]
Read
register 1
Instruction
[20– 16]
Read
Read
register 2 data 1
Registers
Write
Read
register data 2
Instruction
[15– 0]
Instruction
register
Instruction
[15– 0]
Memory
data
register
0
M
Instruction u
x
[15– 11]
1
B
Write
data
0
M
u
x
1
16
Sign
extend
32
Instruction [5– 0]
Shift
left 2
Zero
ALU ALU
result
0
4
Jump
address [31-0]
1 M
u
2 x
3
ALU
control
1 u
x
2
PC [31-28]
0
M
u
x
1
A
28
ALUOut
CMSC411 – Computer Architecture / © 2003 J. Six
Breaking Down an Instruction
The multicycle datapath is complete.
Now we need to look at what should happen
in each clock cycle – we should attempt to
balance the amount of work done in each
cycle so that we can minimize the clock cycle
time.
We will do this by breaking down instruction
execution into a series of steps, each taking
one clock cycle and balanced in length.


Each step contains, at most, one ALU op, one
register file access, or one memory access.
One clock cycle could be as short as the longest of
these operations.
CMSC411 – Computer Architecture / © 2003 J. Six
Clocking Considerations
Remember that our implementation is edgetriggered; we can continue to read the current
value of a register – the new value will not appear
until the next clock cycle.
All of the operations grouped into one step happen
in parallel within the same clock cycle.
Successive steps will all occur in subsequent clock
cycles.
Reading/writing into the PC or a standalone
register is slightly different than reading/writing to
the register file. The former occurs immediately;
the later takes a clock cycle (due to additional
control constraints).
CMSC411 – Computer Architecture / © 2003 J. Six
Instruction Execution Steps
We will now break instruction execution
into these steps – specifically, five steps…





Instruction Fetch
Instruction Decode and Register Fetch
Execution, Memory Address Computation, or
Branch Completion
Memory Access or R-type Completion
Memory Read Completion
Each instruction will take either three,
four, or five, of these steps to complete.
CMSC411 – Computer Architecture / © 2003 J. Six
Instruction Fetch Stage
In this stage, we fetch the instruction from
memory and compute the address of the next
sequential instruction.




Send the PC to the memory as the address, perform
a read, and write the instruction into the IR.
Increment the PC by four.
Store the incremented PC into the PC. This new
value will not be visible until the next clock cycle.
Note that we can increment the PC and read the
instruction from memory in parallel.
CMSC411 – Computer Architecture / © 2003 J. Six
Instruction Decode and
Register Fetch Stage
Here (and in the previous stage), we do not yet
know what type of instruction we’re dealing with
so we can only do tasks common to all
instructions or tasks that are not harmful (does
not matter if they are done and are not
necessary).



We can read the registers specified by rs and rt and
store them in registers A and B.
We can compute the branch target address with the
ALU and save it in ALUOut.
These are non-harmful actions (and can be computed
in parallel) – if the instruction does not require them,
no harm has come; if it does, we’ve gotten some work
done early!
CMSC411 – Computer Architecture / © 2003 J. Six
Execution, Memory Address Computation,
or Branch Completion Stage
In this stage, the datapath operation finally can
be determined by the instruction class.
In all cases, the ALU is operating on the operands
prepared in the previous step, performing one of
three functions.
The action taken during this stage is depenedent
on the instruction class.
For memory reference instructions…

The ALU adds the operands (Register A and the signextended version of bits 0->15 of the instruction).
CMSC411 – Computer Architecture / © 2003 J. Six
Execution, Memory Address Computation,
or Branch Completion Stage
For R-type instructions…

The ALU performs the instruction specified by the
function code on the two values read from the
register file in the previous stage.
For branch instructions…

The ALU computes the Zero signal for the two
registers read in the previous stage. If the
conditional branch is taken, the target address is
written to the PC and is used for the next instruction
fetch.
For jump instructions…

The PC is replaced by the jump address.
CMSC411 – Computer Architecture / © 2003 J. Six
Memory Access or R-type
Completion Stage
For memory access instructions…

In a load instruction, a data word is retrieved
from memory and is written into the MDR.
In a save instruction, the data is written into
memory. The address used is the one
computed in the previous step and stored in
ALUOut.
For R-type instructions…

The contents of ALUOut, the output of the
ALU from the previous stage, is written into
the Result register.
Memory Read
Completion Stage
CMSC411 – Computer Architecture / © 2003 J. Six
In this stage, memory loads complete
by writing the value that was read from
memory in the previous stage from the
MDR into the register file.
CMSC411 – Computer Architecture / © 2003 J. Six
Now … Onto the Control Unit
When we designed the control unit for the
single cycle datapath, we used truth tables.
For our multicycle datapath, the control unit
is more complex because the instruction is
executed in a series of steps – therefore, it
must specify both the control signals and the
next step in the instruction execution
sequence.
We will discuss multicycle control unit using
two different techniques…


Finite State Machines (FSMs)
Microprogramming
CMSC411 – Computer Architecture / © 2003 J. Six
Finite State Machine Design
A finite state machine (FSM) consists of a set of
states and directions on how to change states.
A FSM maps the current state and the inputs
into a new state using a next-state function.
Each state also specifies a set of outputs that
are asserted when the FSM is in that state.

If a specific output is not asserted, we can assume
that it is deasserted.
Let’s look at the FSM implementation of the
multicycle control unit.
CMSC411 – Computer Architecture / © 2003 J. Six
Understanding FSM Diagrams
The FSMs we will now look at include state
transitions (to other FSMs) and a set of
output signals.
The output signals are control signals. These
specify what control signals need to be
specified by that class of instruction at that
stage.

To best understand what the FSM diagrams are
showing you, it’s a good idea to have a copy of
the multicycle datapath with the control signals
labeled, along with the FSM diagram. This should
enable you to easily follow along with what the
FSM diagrams are telling you.
CMSC411 – Computer Architecture / © 2003 J. Six
Generic FSM Flow
A FSM can be constructed for the generic execution flow.
We will break all of these boxes into their own FSMs on
the next couple of slides (the figure numbers refer to the
text).
Start
Instruction fetch/decode and register fetch
(Figure 5.37)
Memory access
instructions
(Figure 5.38)
R-type instructions
(Figure 5.39)
Branch instruction
(Figure 5.40)
Jump instruction
(Figure 5.41)
CMSC411 – Computer Architecture / © 2003 J. Six
Instruction Fetch and Decode FSM
This stage’s FSM is identical for all instruction
classes (again, all numbers refer to the text).
Instruction decode/
Register fetch
Instruction fetch
This is
Figure 5.37.
0
1
ALUSrcA = 0
ALUSrcB = 11
ALUOp = 00
(Op = 'JMP')
Start
MemRead
ALUSrcA = 0
IorD = 0
IRWrite
ALUSrcB = 01
ALUOp = 00
PCWrite
PCSource = 00
Memory reference FSM
(Figure 5.38)
R-type FSM
(Figure 5.39)
Branch FSM
(Figure 5.40)
Jump FSM
(Figure 5.41)
CMSC411 – Computer Architecture / © 2003 J. Six
Instruction class-specific FSMs
After the instruction fetch and decode
FSM, control passes to one of four
FSMs, based on which instruction class
is being executed.
The figure number for each of the four
instruction class-specific FSMs is
provided, so control flow can easily be
followed.
CMSC411 – Computer Architecture / © 2003 J. Six
Memory Reference Instruction FSM
(Figure 5.38)
F r o m s ta te 1
( O p = ' L W ' ) o r ( O p = ' S W ')
M e m o r y a d d r e s s c o m p u ta tio n
2
(Op = 'LW')
A L U S rc A = 1
A L U S rc B = 1 0
A LU O p = 00
M e m o ry
acce ss
3
M e m o ry
a ccess
5
M em R ead
Io r D = 1
M e m W r ite
Io rD = 1
W r ite - b a c k s te p
4
R e g W r it e
M e m to R e g = 1
R egD st = 0
T o s ta te 0
( F ig u r e 5 .3 7 )
CMSC411 – Computer Architecture / © 2003 J. Six
R-Type Instruction FSM
(Figure 5.39)
F r o m s ta te 1
( O p = R - ty p e )
E x e c u t io n
6
A L U S rc A = 1
A L U S rc B = 0 0
A LU O p = 10
R - ty p e c o m p le t io n
7
R egD st = 1
R e g W r i te
M e m to R e g = 0
T o s ta te 0
( F ig u r e 5 .3 7 )
CMSC411 – Computer Architecture / © 2003 J. Six
Branch Instruction FSM
(Figure 5.40)
From state 1
(Op = 'BEQ')
Branch completion
8
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 01
PCWriteCond
PCSource = 01
To state 0
(Figure 5.37)
CMSC411 – Computer Architecture / © 2003 J. Six
Jump Instruction FSM
(Figure 5.41)
From state 1
(Op = 'J')
Jump completion
9
PCWrite
PCSource = 10
To state 0
(Figure 5.37)
CMSC411 – Computer Architecture / © 2003 J. Six
Building a Multicycle Datapath FSM
Now that we have seen the common
FSM for the first stages and the
instruction-specific FSMs for the
following stages, we can build a
common FSM for the multicycle
datapath.
CMSC411 – Computer Architecture / © 2003 J. Six
The Multicycle Datapath FSM
In s t r u c tio n d e c o d e /
r e g is te r fe t c h
In s tr u c tio n fe tc h
S t a rt
M e m o ry a d d re s s
c o m p u t a tio n
M em R ead
A L U S rc A = 0
Io r D = 0
IR W r ite
A L U S rc B = 0 1
ALUO p = 00
P C W r ite
P C S o u rc e = 0 0
6
(Op = 'LW')
A L U S rc A = 1
A L U S rc B = 1 0
ALUO p = 00
M e m o ry
access
5
M em R ead
Io r D = 1
W rite - b a c k s te p
R eg D st = 0
R e g W rite
M e m to R e g = 1
B ra n c h
c o m p le tio n
8
A L U S rc A = 1
A L U S rc B = 0 0
ALU O p = 10
M e m o ry
a cce ss
4
A L U S rc A = 0
A L U S rc B = 1 1
A LU O p = 00
E x e c u tio n
2
3
1
M e m W r ite
Io r D = 1
R egD st = 1
R e g W rite
M e m to R e g = 0
Jum p
c o m p le tio n
9
A L U S rc A = 1
A L U S rc B = 0 0
ALUO p = 01
P C W rit e C o n d
P C S o u rc e = 0 1
R - ty p e c o m p le tio n
7
(Op = 'J')
0
P C W r ite
P C S o u rc e = 1 0
CMSC411 – Computer Architecture / © 2003 J. Six
FSM Logic Implementation
FSMs are often implemented in hardware using
a state register (to hold the current state) and a
big block of combinatorial logic that takes…


Inputs: current state and the instruction register
opcode field
Outputs: next state and the datapath control signals
This is not the only way to implement a FSM in
hardware, but it is typical. This implementation
is known as a Moore machine, as the outputs
only depend on the current state.
CMSC411 – Computer Architecture / © 2003 J. Six
FSM Logic Implementation
(Moore Machine)
Combinational
control logic
Datapath control outputs
Outputs
Inputs
Next state
Inputs from instruction
register opcode field
State register
CMSC411 – Computer Architecture / © 2003 J. Six
Limitations of FSM-based
Control Unit Design
Control design using FSM proved successful
with our (very) limited implementation
(although the FSM took up a whole slide)
Imagine designing (or understanding) a FSM
for a complete MIPS implementation (100+
instruction, ranging from 1 to 20 clock cycles)
or IA-32 (many times that number of
instructions plus a lot of addressing modes).
Even after such a complex FSM is designed,
how can it be translated into hardware? The
potential for errors is huge.
FSM control design does not scale well.
CMSC411 – Computer Architecture / © 2003 J. Six
Microinstructions
Think of what defines the behavior of the
datapath during a stage of instruction
execution - control signals.
Enter microinstructions - a low-level
control instruction that consists of the set
of control signals that are asserted at that
stage of instruction execution.
Microinstructions must also include some
notion of sequencing…


is the next sequential microinstruction executed?
do we transfer control to a different microinstruction?
CMSC411 – Computer Architecture / © 2003 J. Six
Microprogramming
The design of a program that
implements the machine instructions in
terms of microinstructions is known as
microprogramming.
Just like we design assembly languages
to represent machine instructions as a
series of field, we can do the same for
microinstructions.
Now, let’s turn to designing a
microinstruction format.
CMSC411 – Computer Architecture / © 2003 J. Six
Microinstruction Format Design
The microprogram is just a symbolic
representation of the control that is converted
into a series of control signals – the format
chosen for this should simplify the
representation, making it as easy as possible to
write and understand a microprogram.
We also do not want to allow inconsistent
microinstructions…



We cannot let a microinstruction require that a control
signal be set to two different values.
To do this, we make each microinstruction field specify
a non-overlapped set of control signals.
Signals that are never asserted at the same time may
share the same field.
CMSC411 – Computer Architecture / © 2003 J. Six
Our Microinstruction Format
We can define a simple microinstruction format
for our MIPS implementation.


The first six fields control the datapath.
The last field specifies how to select the next
microinstruction.
Field
Function of Field
ALU control
operation being done by ALU
SRC1
source for first ALU operand
SRC2
source for second ALU operand
Register Control
read or write for register file, source of value for a write
Memory
read or write for memory, source for memory, dest register
PCWrite Control
writing of the PC
Sequencing
how to choose the next microinstruction
CMSC411 – Computer Architecture / © 2003 J. Six
Microinstruction Addressing
Typically, microinstructions are stored in a ROM
or PLA and are given (sequential) addresses.
There are three methods to choose the next
microinstruction (MI)…



Increment the address of the current MI. (Seq in the
Sequencing field)
Branch to the MI that begins execution of the next
MIPS instruction. (Fetch in the Sequencing field)
Choose the next MI based on control unit input
(called a dispatch). A dispatch table of addresses of
target microinstructions is stored in a ROM/PLA.
There can be >1 dispatch tables. (Dispatch x in the
Sequencing field; x = # of dispatch table to use).
CMSC411 – Computer Architecture / © 2003 J. Six
Possible Field Values
Let’s look at the possible values for some of
the microinstruction fields.

ALU Control (operation for the ALU)
 Add, Subt, or Func Code (use the funct field)

SRC1 (first ALU input)
 PC, A (register A)

SRC2 (second ALU input)
 B, 4, Extend (sign-ext), Extshft (sign-ext + shift)

Register Control (read/write for reg file + number)
 Read (read into A and B), Write ALU (write ALUOut to
register file), Write MDR (write MDR to register file)
CMSC411 – Computer Architecture / © 2003 J. Six
Possible Field Values

Memory (read/write and address for mem)
 Read PC (read mem[PC] into IR), Read ALU
(read mem[ALUOut] into MDR), Write ALU
(write B into mem[ALUOut])

PCWrite Control (writing of the PC)
 ALU (write ALU output into PC), ALUOut-cond
(if Zero, write ALUOut into PC), Jump address
(write jump address into PC)

Sequencing
 Seq, Fetch, Dispatch x
CMSC411 – Computer Architecture / © 2003 J. Six
Example
The first two steps of execution of an instruction are
the same. Here’s the microprogram…
Label
ALU
SRC SRC2
Control 1
Reg. Memory PCWrite SequenCtrl. Ctrl.
Ctrl.
cing
Fetch
Add
PC
4
Add
PC
Extshift Read
Read PC
ALU
Seq
Dispatch 1
In MI 1, we compute PC+4, fetch the instruction into
the IR, cause the output of the ALU to be written into
the PC, and then go to the next MI.
In MI 2, we store PC+ sign extended and shifted
IR[15-0] into ALUOut, read values into A & B, and use
dispatch table 1 to find out where to go next
(switching based on instruction class).
Implementing
Microcode Controllers
CMSC411 – Computer Architecture / © 2003 J. Six
A common way to implement microcode
controllers (hardware to run
microprograms/microcode) stored the code in a
read-only memory (ROM/PLA) and implements
the sequencing function separately.
The microcode store specifies the control lines
and how to select the next MI – not the next MI
itself. A separate piece of logic, the address
select logic takes the sequencing information
from the microcode and inputs from the opcode
of the current instruction to select the next
microinstruction.
CMSC411 – Computer Architecture / © 2003 J. Six
A Common Implementation
Microcode
storage
Datapath
control
outputs
Outputs
Input
1
Microprogram counter
Adder
Address select logic
Inputs from instruction
register opcode field
Sequencing
control
CMSC411 – Computer Architecture / © 2003 J. Six
Hardware Support for Exceptions
We have one last item to discuss concerning
multicycle datapath implementations.
One of the hardest parts of microprocessor
control is handling exceptions and interrupts –
events other than branches and control that
change the normal flow of instruction execution.


An exception is an unexpected event from within the
processor – such as an arithmetic overflow.
An interrupt is an unexpected event from outside the
processor – these are typically caused by I/O devices
that require attention.
CMSC411 – Computer Architecture / © 2003 J. Six
Interrupts vs. Exceptions
The terms exception and interrupt are
frequently confused and intermixed…


Intel x86 uses the word interrupt for all of these
events (internal or external).
PowerPC uses the word exception for an unusual
event and the word interrupt to indicate the change
in control flow.
We will use the term exception to refer to any
(internal or external) unexpected change in
control flow and the term interrupt only when
the event is externally caused (this is the
convention that MIPS uses).
Exception Support
in the Design Process
CMSC411 – Computer Architecture / © 2003 J. Six
We will now look at control unit support for
two types of exceptions that arise from the
instructions we have already implemented.
Determining exceptional conditions (and taking
the appropriate action) is often on the critical
path of a design (the longest path through the
datapath) and is thus vital to computing the
clock cycle time.

Adding support for exceptions after the design of a
datapath has been finalized often significantly
reduce performance and complicate the effort
required to correctly implement the design.
CMSC411 – Computer Architecture / © 2003 J. Six
Exception Processing
The two exceptions we will support are…


an undefined instruction
arithmetic overflow
When we encounter one of these conditions, will
store the address of the offending instruction in
the exception program counter (EPC) and
transfer control to the OS of the machines at
some specified address (the exception handler).
Normally, this exception handler deals with the
error and then either terminates the program or
continues its execution (using the EPC value).
CMSC411 – Computer Architecture / © 2003 J. Six
Expression the Reason
for the Exception
For the OS to handle the exception, it needs to
know what caused it. There are two main
methods used for this…


Method 1 (used in the MIPS architecture) Add a
register (called the Cause register) that stores a field
the indicates the reason for the exception. Here
these is one major exception handler.
Method 2 (used in the Intel x86 architecture):
Known as vectored interrupts, the address to which
control is transferred is determined by the cause of
the exception (many exception handlers, called from
a table – sometimes called the interrupt vector jump
table).
CMSC411 – Computer Architecture / © 2003 J. Six
Implementing Exception Support
Since we are implementing MIPS, we need to
add two registers to our design…


EPC: this stores the address of the offending
instruction (this would also be needed for vectored
interrupts).
Cause: records the cause of the exception. For our
implementation, 0 will represent an undefined
instruction and 1 will represent an arithmetic overflow.
We need new signals…




EPCWrite – write to the EPC register?
CauseWrite – write to the Cause register?
IntCause – write a zero or one (which exception?)
exception address – PC of the offending instruction
CMSC411 – Computer Architecture / © 2003 J. Six
The New Datapath
CauseWrite
IntCause
EPCWrite
PCSource
ALUOp
PCWriteCond
PCWrite
IorD
Outputs
MemRead
MemWrite
ALUSrcB
Control
ALUSrcA
MemtoReg
IRWrite
RegWrite
Op
[5– 0]
RegDst
0
26
Instruction [25– 0]
PC
0
M
u
x
1
Shift
left 2
Instruction
[31-26]
Address
Memory
MemData
Write
data
Read
register 1
Instruction
[20– 16]
Read
Read
register 2 data 1
Registers
Write
Read
register data 2
Instruction
register
Instruction
[15– 0]
Memory
data
register
Jump
address [31-0]
2
CO 00 00 00
Instruction
[25– 21]
Instruction
[15– 0]
28
0
M
Instruction u
x
[15– 11]
1
B
0
M
u
x
1
Zero
ALU ALU
result
ALUOut
0
1
Sign
extend
32
Instruction [5– 0]
Shift
left 2
EPC
1M
u
2x
3
16
3
0
4
Write
data
u
x
PC [31-28]
0
M
u
x
1
A
1M
ALU
control
0
M
u
x
1
Cause
CMSC411 – Computer Architecture / © 2003 J. Six
Support for Exceptions
Since the PC has already been incremented
during the first cycle of instruction execution,
we cannot just write the PC into the EPC.

We use the ALU to subtract from 4 from the PC
and write to the EPC.
 We already have 4 as an input – great!
This requires no additional control signals or
paths.
If an exception is detected, we will follow a
FSM to successfully implement handling.
CMSC411 – Computer Architecture / © 2003 J. Six
Exception FSM
If an exception occurs, we follow a simple FSM…
11
10
IntCause = 0
CauseWrite
ALUSrcA = 0
ALUSrcB = 01
ALUOp = 01
EPCWrite
PCWrite
PC++Source = 11
IntCause = 1
CauseWrite
ALUSrcA = 0
ALUSrcB = 01
ALUOp = 01
EPCWrite
PCWrite
PCSource = 11
PCSource = 11
To state 0 to begin next instruction
CMSC411 – Computer Architecture / © 2003 J. Six
Exception Detection
So how does the control unit detect an
exception?


An undefined instruction is detected when
no next state is defined for the opcode
value. If the opcode is not for one of the
instructions we support, an undefined
instruction exception has occurred.
When the ALU Overflow signal is asserted,
an verflow exception has occurred.
We can then add the exception FSM to
our generic FSM.
CMSC411 – Computer Architecture / © 2003 J. Six
Our Final Multicycle Datapath FSM
Instruction decode/
Register fetch
1
Instruction fetch
0
MemRead
ALUSrcA = 0
IorD = 0
IRWrite
ALUSrcB = 01
ALUOp = 00
PCWrite
PCSource = 00
Memory address
computation
2
6
(Op = 'LW')
8
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 10
Memory
access
Memory
access
5
MemRead
IorD = 1
RegWrite
MemtoReg = 1
RegDst = 0
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 01
PCWriteCond
PCSource = 01
R-type completion
7
MemWrite
IorD = 1
11
RegDst = 1
RegWrite
MemtoReg = 0
Write-back step
4
Branch
completion
Execution
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 00
3
ALUSrcA = 0
ALUSrcB = 11
ALUOp = 00
Overflow
(Op = 'J')
Start
Overflow
Jump
completion
9
PCWrite
PCSource = 10
IntCause = 1
CauseWrite
ALUSrcA = 0
ALUSrcB = 01
ALUOp = 01
EPCWrite
PCWrite
PCSource = 11
10
IntCause = 0
CauseWrite
ALUSrcA = 0
ALUSrcB = 01
ALUOp = 01
EPCWrite
PCWrite
PCSource = 11
CMSC411 – Computer Architecture / © 2003 J. Six
One More Issue
Close examination of the final multicycle
datapath FSM reveals that if an arithmetic
overflow occurs, the instruction completes
and writes its (incorrect) result, because the
overflow branch occurs when the instruction
is in the write stage.
Is this the correct behavior? Maybe – it’s a
design decision!

Some architectures – including MIPS – specify that
an instruction has no effect if it causes an
exception (so the FSM is not quite accurate to the
MIPS specification).
CMSC411 – Computer Architecture / © 2003 J. Six
Case Study:
The IA-32/x86 Control
The Intel IA-32 microprocessor line, since the
80486, uses a combination of hardwired control
and microprogrammed control.


Hardwired control is used for simple instructions
(single path through the datapath and a small
number of clock cycles).
Microprogrammed control is used for complex
instructions.
This allows IA-32 to achieve low cycle counts for
simple instructions while not requiring such
instructions to pass through the complex
datapath that handles the very complex general
instructions (make the common case fast!).
CMSC411 – Computer Architecture / © 2003 J. Six
The Pentium Pro Datapath
The Pentium and all of its successors are
superscalar processors (we will discuss this in
detail later).



More than one instruction is executed per clock cycle
– this requires a duplication of datapath resources.
Think of the PPro as having multiple datapaths, each
tailored for a specific type of instruction.
Using such a design, we can execute a memory load
at the same time as we execute a computation
instruction, for example.
CMSC411 – Computer Architecture / © 2003 J. Six
Microoperations
The PPro executes simple microoperations
(another name for microinstructions), much like
MIPS instructions.
Microoperations (MOs) are self-contained
operations that are 72-bits wide internally –
these 72 bits are expanded to 120 (for the
integer datapath) or 285 (for the floating-point
datapath) control signals.
The control unit that executes the MOs is a
hardwired control unit (like the FSM model we
have already seen).
CMSC411 – Computer Architecture / © 2003 J. Six
Generating Microoperations
The instruction->MO generation is done
in one of two ways.


For x86 instructions that require less than
four MOs, that instruction is directly decoded
into one to four MOs by hardwired PLAs.
For other x86 instructions, the
microprocessor uses a traditional microcode
sequencer (the design we saw for MIPS with
address select logic) and microcode control
store to generate the sequence of five or
more MOs.
CMSC411 – Computer Architecture / © 2003 J. Six
Pentium Pro Performance
The Pentium Pro’s datapath combines many
features together…


simple, low-level hardwired control and simple
datapaths for executing MOs
translation process that has a hardwired control for
simple instructions and microcoded control for
complex instructions
These features allow the PPro to perform very
well in terms of CPI for integer instructions
compared with traditional RISC
microprocessors – something that was a long
time coming from a traditional CISC design.
CMSC411 – Computer Architecture / © 2003 J. Six
Summary
Datapath design can be vastly improved
by moving from a single cycle to a
multicycle design.
Control unit design can be accomplished
using hardwired control (FSM) or
microprogramming.
Now that we have covered the basics of
datapath and control unit design, we will
move to pipelining, the key to modern
microprocessor performance.

Document

Transcript Document

Directory