Transcript org2-2
Part 2
A five-level memory hierarchy.
Tanenbaum, Structured Computer
Organization, Fifth Edition, (c) 2006
Pearson Education, Inc. All rights
reserved. 0-13-148521-0
Note cost vs. size.
1.
2.
3.
4.
5.
All instructions are directly executed by
hardware.
Maximize the rate at which instructions are
issued.
Instructions should be easy to decode.
Only loads and stores should reference
memory.
Provide many registers.
1. All instructions are directly executed by
hardware.
Eliminate the microcode interpreter
2. Maximize the rate at which instructions are
issued.
If you issue 500 MIPS, you have a 500 MIPS
machine.
Parallelism
3. Instructions should be easy to decode.
Made possible by regular, fixed-length instructions
w/ a small number of fields.
Fewer instructions are better.
Fewer instruction formats are better.
4. Only loads and stores should reference
memory.
Memory access takes a long time.
Most instructions should use registers.
Separate ops for load & store.
can be done in parallel
5. Provide many registers.
At least 32!
Time consuming to have to save registers
temporarily and reload them later.
Ways to increase speed:
a.
increase the clock speed
b.
parallelism types:
1. processor/core level
2. instruction level
Fetching instruction from memory is slow.
So use a Prefetch Buffer = set of registers
(memory) containing instructions to be
executed.
Fetch and execution can now be done in parallel!
A five-stage pipeline
The state of each
stage as a function of time. Nine
Tanenbaum, Structured Computer
Fifth Edition, (c) 2006
clock cycles areOrganization,
illustrated.
Pearson Education, Inc. All rights
reserved. 0-13-148521-0
Latency = time to execute instruction
Bandwidth = MIPS (instructions per second –
typically in millions)
Cycle time = time to move through 1 stage of
the pipeline = clock rate = clock cycle
Problem: Let the clock rate = 3 nsec/stage and
the execution of each instruction requires 6
stages or steps.
a.
What is the bandwidth in MIPS for a machine
without any pipeline (i.e., without any instructionlevel parallelism)?
b.
What is the bandwidth in MIPS for a machine with
a pipeline?
Problem: Let the clock rate = 3 nsec/stage and
the execution of each instruction requires 6
stages or steps.
a.
What is the bandwidth in MIPS for a machine
without any pipeline (i.e., without any instructionlevel parallelism)?
6 stages/inst x 3x10-9 sec/stage = 18x10-9 sec/inst
1 inst/18x10-9 sec = 56 MIPS
6 stages
1 instructio
n
3 10
9
seconds
1 stage
18 10
9
seconds
1 instructio
n
Problem: Let the clock rate = 3 nsec/stage and
the execution of each instruction requires 6
stages or steps.
a.
What is the bandwidth in MIPS for a machine
without any pipeline (i.e., without any instructionlevel parallelism)?
6 stages/inst x 3x10-9 sec/stage = 18x10-9 sec/inst
1 inst/18x10-9 sec = 56 MIPS
b.
What is the bandwidth in MIPS for a machine with
a pipeline?
Problem: Let the clock rate = 3 nsec/stage and
the execution of each instruction requires 6
stages or steps.
a.
What is the bandwidth in MIPS for a machine
without any pipeline (i.e., without any instructionlevel parallelism)?
6 stages/inst x 3x10-9 sec/stage = 18x10-9 sec/inst
1 inst/18x10-9 sec = 56 MIPS
b.
What is the bandwidth in MIPS for a machine with
a pipeline?
3x10-9 sec/inst
1 inst/3x10-9 sec = 333 MIPS
fetches pairs of instructions
Dual five-stage pipelines with a common instruction fetch
unit.
Tanenbaum, Structured Computer
Organization, Fifth Edition, (c) 2006
Pearson Education, Inc. All rights
reserved. 0-13-148521-0
Note: Since 2 inst can be executed at the same time (S4),
they must not conflict over resource usage (e.g., register)
and neither must depend on the result of the other.
How can we insure this?
Tanenbaum, Structured Computer
Organization, Fifth Edition, (c) 2006
Pearson Education, Inc. All rights
reserved. 0-13-148521-0
Note: Since 2 inst can be executed at the same time (S4),
they must not conflict over resource usage (e.g., register)
and neither must depend on the result of the other.
How can we insure this? (1) hardware, (2) compiler
Tanenbaum, Structured Computer
Organization, Fifth Edition, (c) 2006
Pearson Education, Inc. All rights
reserved. 0-13-148521-0
386 – no pipeline
486 – one pipeline
first generation Pentium
two 5-stage pipelines:
1.
u pipeline - can execute any instruction
2.
v pipeline – limited; only integer instructions or FXCH
P4 – 20 stages
“The later "Prescott" and "Cedar Mill" Pentium 4 cores (and their
Pentium D derivatives) had a 31-stage pipeline, the longest in
mainstream consumer computing.” http://en.wikipedia.org/wiki/Instruction_pipeline
Nehalem (16 pipeline stages), Enhanced Core, and Sandy Bridge
microachitecture (next few slides; see
http://www.intel.com/content/dam/doc/manual/64-ia-32architectures-optimization-manual.pdf)
S3 issued every clock cycle
S4 may require more than 1 clock cycle
Tanenbaum, Structured
Computer
A superscalar processor
with
five functional units.
Organization, Fifth Edition, (c) 2006
Pearson Education, Inc. All rights
reserved. 0-13-148521-0