CS 61C: Great Ideas in Computer Architecture Control and Pipelining, Part I Instructors: Krste Asanovic, Randy H.

Download Report

Transcript CS 61C: Great Ideas in Computer Architecture Control and Pipelining, Part I Instructors: Krste Asanovic, Randy H.

CS 61C:
Great Ideas in Computer Architecture
Control and Pipelining, Part I
Instructors:
Krste Asanovic, Randy H. Katz
http://inst.eecs.Berkeley.edu/~cs61c/fa12
11/6/2015
Fall 2012 -- Lecture #27
1
You Are Here!
Software
• Parallel Requests
Assigned to computer
e.g., Search “Katz”
Hardware
Smart
Phone
Warehouse
Scale
Computer
Harness
• Parallel Threads Parallelism &
Assigned to core
e.g., Lookup, Ads
Achieve High
Performance
Computer
• Parallel Instructions
>1 instruction @ one time
e.g., 5 pipelined instructions
Memory
• Hardware descriptions
All gates @ one time
Today’s
Lecture
Instruction Unit(s)
Core
Functional
Unit(s)
A0+B0 A1+B1 A2+B2 A3+B3
Main Memory
Logic Gates
• Programming Languages
11/6/2015
Core
(Cache)
Input/Output
• Parallel Data
>1 data item @ one time
e.g., Add of 4 pairs of words
…
Core
Fall 2012 -- Lecture #27
2
Levels of
Representation/Interpretation
High Level Language
Program (e.g., C)
Compiler
Assembly Language
Program (e.g., MIPS)
Assembler
Machine Language
Program (MIPS)
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
lw
lw
sw
sw
0000
1010
1100
0101
$t0, 0($2)
$t1, 4($2)
$t1, 0($2)
$t0, 4($2)
1001
1111
0110
1000
1100
0101
1010
0000
Anything can be represented
as a number,
i.e., data or instructions
0110
1000
1111
1001
1010
0000
0101
1100
1111
1001
1000
0110
0101
1100
0000
1010
1000
0110
1001
1111
Machine
Interpretation
Hardware Architecture Description
(e.g., block diagrams)
Architecture
Implementation
Logic Circuit Description
(Circuit Schematic Diagrams) Fall 2012 -- Lecture #27
11/6/2015
3
Agenda
• Pipelined Execution
• Administrivia
• Pipelined Datapath
11/6/2015
Fall 2012 -- Lecture #27
4
Agenda
• Pipelined Execution
• Administrivia
• Pipelined Datapath
11/6/2015
Fall 2012 -- Lecture #27
5
Review: Single-Cycle Processor
• Five steps to design a processor:
Processor
1. Analyze instruction set 
Input
datapath requirements
Control
Memory
2. Select set of datapath
components & establish
Datapath
Output
clock methodology
3. Assemble datapath meeting
the requirements: re-examine for pipelining
4. Analyze implementation of each instruction to determine
setting of control points that effects the register transfer.
5. Assemble the control logic
• Formulate Logic Equations
• Design Circuits
11/6/2015
Fall 2012 -- Lecture #27
6
Pipeline Analogy: Doing Laundry
• Ann, Brian, Cathy, Dave
each have one load of clothes to
wash, dry, fold, and put away
A B C D
– Washer takes 30 minutes
– Dryer takes 30 minutes
– “Folder” takes 30 minutes
– “Stasher” takes 30 minutes to put
clothes into drawers
11/6/2015
Fall 2012 -- Lecture #27
7
Sequential Laundry
6 PM 7
T
a
s
k
O
r
d
e
r
A
8
9
10
11
12
1
2 AM
30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
Time
B
C
D
11/6/2015
• Sequential laundry takes
8 hours for 4 loads
Fall 2012 -- Lecture #27
8
Pipelined Laundry
6 PM 7
T
a
s
k
8
9
11
10
3030 30 30 30 30 30
12
1
2 AM
Time
A
B
C
O
D
r
d
e
r •
11/6/2015
Pipelined laundry takes
3.5 hours for 4 loads!
Fall 2012 -- Lecture #27
9
Pipelining Lessons (1/2)
6 PM
T
a
s
k
8
9
Time
30 30 30 30 30 30 30
A
B
O
r
d
e
r
7
C
D
11/6/2015
• Pipelining doesn’t help latency
of single task, it helps
throughput of entire workload
• Multiple tasks operating
simultaneously using different
resources
• Potential speedup = Number
pipe stages (4 in this case)
• Time to fill pipeline and time
to drain it reduces speedup:
8 hours/3.5 hours or 2.3X
v. potential 4X in this example
Fall 2012 -- Lecture #27
10
Pipelining Lessons (2/2)
6 PM
T
a
s
k
8
9
Time
30 30 30 30 30 30 30
A
B
O
r
d
e
r
7
C
D
11/6/2015
• Suppose new Washer
takes 20 minutes, new
Stasher takes 20
minutes. How much
faster is pipeline?
• Pipeline rate limited by
slowest pipeline stage
• Unbalanced lengths of
pipe stages reduces
speedup
Fall 2012 -- Lecture #27
11
Agenda
• Pipelined Execution
• Administrivia
• Pipelined Datapath
11/6/2015
Fall 2012 -- Lecture #27
12
Administrivia
• Project #4, Labs #10 and #11, (Last) HW #6
posted
– Project due 11/11 (Sunday after next)
– Project is not difficult, but is long: don’t wait for
the weekend when it is due! Start early!
– Look at Logisim labs before you start on Project #4
• Lots of useful tip and tricks for using Logisim in those
labs
• TA Sung Roa will hold extended office hours
on the project this coming weekend
11/6/2015
Fall 2012 -- Lecture #27
13
Administrivia
• Reweighting of Projects
– Still 40% of grade overall
•
•
•
•
•
Project 1: 5%
Project 2: 12.5%
Project 3: 10%
Project 4: 12.5%
Optional Extra Credit Project 5: up to 5% extra
– Gold, Silver, and Bronze medals to the three fastest projects
– Code size and aesthetics also a criteria for recognition
– Winning projects as selected by the TAs will be highlighted in
class
11/6/2015
Fall 2012 -- Lecture #27
14
Agenda
• Pipelined Execution
• Administrivia
• Pipelined Datapath
11/6/2015
Fall 2012 -- Lecture #27
15
Review: RISC Design Principles
• “A simpler core is a faster core”
• Reduction in the number and complexity of instructions in
the ISA  simplifies pipelined implementation
• Common RISC strategies:
– Fixed instruction length, generally a single word (MIPS = 32b);
Simplifies process of fetching instructions from memory
– Simplified addressing modes; (MIPS just register + offset)
Simplifies process of fetching operands from memory
– Fewer and simpler instructions in the instruction set;
Simplifies process of executing instructions
– Simplified memory access: only load and store instructions
access memory;
– Let the compiler do it. Use a good compiler to break complex
high-level language statements into a number of simple
assembly language statements
11/6/2015
Fall 2012 -- Lecture #27
16
Review: Single Cycle Datapath
31
26
21
op
16
rs
0
rt
immediate
• Data Memory {R[rs] + SignExt[imm16]} = R[rt]
Rs Rt
5
5
Rw
busW
5
Ra Rb
busA
busB
32
imm16
16
ExtOp=
Extender
clk
32
=
ALU
RegFile
32
11/6/2015
Rs Rt Rd
zero ALUctr=
0
<0:15>
RegWr=
<11:15>
1
clk
Instruction<31:0>
<16:20>
Rd Rt
instr
fetch
unit
<21:25>
nPC_sel=
RegDst=
Imm16
MemtoReg=
MemWr=
32
0
0
32
1
Data In
32
ALUSrc=
Fall 2012 -- Lecture #27
clk
WrEn Adr
Data
Memory
1
17
Steps in Executing MIPS
1) IF: Instruction Fetch, Increment PC
2) ID: Instruction Decode, Read Registers
3) EX: Execution
Mem-ref: Calculate Address
Arith-log: Perform Operation
4) Mem:
Load: Read Data from Memory
Store: Write Data to Memory
5) WB: Write Data Back to Register
11/6/2015
Fall 2012 -- Lecture #27
18
+4
1. Instruction
Fetch
11/6/2015
rd
rs
rt
ALU
Data
memory
registers
PC
instruction
memory
Redrawn Single-Cycle Datapath
imm
2. Decode/
3. Execute 4. Memory
Register Read
Fall 2012 -- Lecture #27
5. Write
Back
19
+4
1. Instruction
Fetch
rd
rs
rt
ALU
Data
memory
registers
PC
instruction
memory
Pipelined Datapath
imm
2. Decode/
3. Execute 4. Memory
Register Read
5. Write
Back
• Add registers between stages
– Hold information produced in previous cycle
• 5 stage pipeline; clock rate potential 5X faster
11/6/2015
Fall 2012 -- Lecture #27
20
More Detailed Pipeline
Registers named for adjacent stages, e.g., IF/ID
11/6/2015
Fall 2012 -- Lecture #27
21
IF for Load, Store, …
Highlight combinational logic components used
+ right half of state logic on read, left half on write
11/6/2015
Fall 2012 -- Lecture #27
22
ID for Load, Store, …
11/6/2015
Fall 2012 -- Lecture #27
23
EX for Load
11/6/2015
Fall 2012 -- Lecture #27
24
MEM for Load
11/6/2015
Fall 2012 -- Lecture #27
25
WB for Load
Has Bug that was in 1st edition of textbook!
Wrong
register
number
11/6/2015
Fall 2012 -- Lecture #27
26
Corrected Datapath for Load
Correct
register
number
11/6/2015
Fall 2012 -- Lecture #27
27
Pipelined Execution Representation
Time
IF
ID
EX
Mem WB
IF
ID
EX
Mem WB
IF
ID
EX
Mem WB
IF
ID
EX
Mem WB
IF
ID
EX
Mem WB
IF
ID
EX
Mem WB
• Every instruction must take same number of steps, also
called pipeline stages, so some will go idle sometimes
11/6/2015
Fall 2012 -- Lecture #27
28
rd
rs
rt
ALU
Data
memory
registers
PC
instruction
memory
Graphical Pipeline Diagrams
imm
+4
1. Instruction
Fetch
2. Decode/
3. Execute 4. Memory
Register Read
5. Write
Back
• Use datapath figure below to represent pipeline
IF
11/6/2015
Reg
EX
ALU
I$
ID
Mem WB
D$
Reg
Fall 2012 -- Lecture #27
29
Graphical Pipeline Representation
(In Reg, right half highlight read, left half write)
Time (clock cycles)
Reg
Reg
D$
Reg
I$
Reg
D$
Reg
I$
Reg
ALU
D$
Reg
I$
Reg
ALU
I$
D$
ALU
Reg
ALU
I$
ALU
I
n
Load
s
t Add
r.
Store
O
Sub
r
d Or
e
r11/6/2015
D$
Fall 2012 -- Lecture #27
Reg
30
Pipeline Performance
• Assume time for stages is
– 100ps for register read or write
– 200ps for other stages
• What is pipelined clock rate?
– Compare pipelined datapath with single-cycle datapath
Instr
Instr fetch Register
read
ALU op
Memory
access
Register
write
Total time
lw
200ps
100 ps
200ps
200ps
100 ps
800ps
sw
200ps
100 ps
200ps
200ps
R-format
200ps
100 ps
200ps
beq
200ps
100 ps
200ps
11/6/2015
700ps
100 ps
600ps
500ps
Fall 2012 -- Lecture #27
31
Student Roulette?
Pipeline Performance
Single-cycle (Tc= 800ps)
Pipelined (Tc= 200ps)
11/6/2015
Fall 2012 -- Lecture #27
32
Pipeline Speedup
• If all stages are balanced
– i.e., all take the same time
– Time between instructionspipelined
= Time between instructionsnonpipelined
Number of stages
• If not balanced, speedup is less
• Speedup due to increased throughput
– Latency (time for each instruction) does not
decrease
11/6/2015
Fall 2012 -- Lecture #27
33
Instruction Level Parallelism (ILP)
• Another parallelism form to go with Request
Level Parallelism and Data Level Parallelism
– RLP – e.g., Warehouse Scale Computing
– DLP – e.g., SIMD, Map-Reduce
• ILP – e.g., Pipelined Instruction Execution
– 5 stage pipeline => 5 instructions executing
simultaneously, one at each pipeline stage
11/6/2015
Fall 2012 -- Lecture #27
34
And in Conclusion, …
The BIG Picture
• Pipelining improves performance by increasing
instruction throughput: exploits ILP
– Executes multiple instructions in parallel
– Each instruction has the same latency
• Key enabler is placing registers between
pipeline stages
11/6/2015
Fall 2012 -- Lecture #27
35