It’s the end of the class as we know it

Download Report

Transcript It’s the end of the class as we know it

It’s the end of the class as we
know it
Misc. Status issues
• HW4 & HW5 available outside my office in
the next week.
• HW5 answers posted tonight
Stuff still to do
• Oral report
– Don’t forget to be there for the whole “group”
– PowerPoint or other slides
• Bring your own computer ready to go or use mine if needed
• Written report
– Due 9pm Wednesday.
• Please e-mail the entire staff.
• Review session
– 1-3pm on Sunday 12/15, 1670 Beyster
• Exam Thursday 4-6pm
– 1670 and 1690 Beyster
Exam review & office hours
• Question/answer
– Sunday 1-3, 1690 Beyster (as on prev. page)
• Office hours
– Thursday
– Monday
– Thursday
12/12
12/16
12/19
4-5
10:30-noon
9:30-10:30am
Class summary
• Major topics
– ILP in hardware (Out-of-order processors)
• How they work AND why we use them
–
–
–
–
Caches and Virtual Memory
Multi-processor
ILP in software (Complier, IA-64)
Power
• Less major topics
– Branch prediction
• Direction and target
– LSQ, Superscalar
– Buses/interconnect
The big questions
• What is computer architecture?
• What are the metrics of performance?
• What are the techniques we use to
maximize these metrics?
ILP in hardware (1/2)
• ILP definitions
– Hazards vs dependencies
• Data, Name and Control dependencies
– What ILP means and finding it.
• Dynamic Scheduling
– Tomasulo’s (three versions!)
• You can be promised a question on this!
• Branch Prediction
– Local, global, hybrid/correlating
• Tournament and gshare
– BTBs
ILP in hardware (2/2)
• Multiple Issue
– Static
• Static Superscalar
• VLIW
– Dynamic superscalar
• Speculation
– Branch, data
• ILP limit studies
ILP in hardware: Questions
•
True or False
1. The original Tomasulo's algorithm only allows
reordering within basic blocks
2. In P6, if it weren’t for precise interrupts, it would be
okay to retire instructions out-of-order as long as
they had finished executing and a branch isn’t
skipped over.
3. ILP in hardware is limited in scope due to the
“instruction window” which is basically the size of
the RS.
Quick idea: SMT
• One processor, two threads.
Caching (1/2)
• There is a huge amount of stuff associated
with caching. The important stuff
– Locality
• Temporal/Spatial
• 3’Cs model
• stack distance model
– Nuts-and-bolts
•
•
•
•
Replacement policies (LRU, pseudo-LRU)
Performance (hit rate, Thit; Tmiss, average access time)
Write back/Write thru
Block size
– Basic improvement
• Multi-level cache
• Critical word first
• Write buffers
Caching (2/2)
• Non-standard caches
– Victim
– Skew
– Trace (just know very basics)
• Misc.
– Virtual addresses and caching
– Impact of prefetching
– Latency hiding with OO execution
Cache: Questions (1/2)
• Changing __________ has an impact on
compulsory misses.
• A victim cache is more likely to help with
________ than ________ though it can help
both (3’Cs)
• At least _____ bits are required to keep exact
track of LRU in a 5-way associative cache.
Cache question (2/2)
• A ____________ cache has a number of
sets equal to the number of lines in the
cache.
• A fully-associative cache with N lines will
miss an access that has a stack distance
of ________ (state the largest range you
can).
Multi-processor
• Amdahl’s law as it applies to MP.
• Bus-based multi-processor
– Snooping
– MESI
– Bus transaction types (BRL etc.)
• Distributed-shared
– Directory schemes
• Synchronization
– Critical sections
– Spin-locks
Multi-processor: Question
• Under the MESI protocol what is the
advantage of having a distinct clean and
dirty exclusive state?
Processor
Address
Load/store
1
0x100
Load
1
0x200
Load
1
0x100
Load
1
0x110
Store
2
0x200
Load
1
0x200
Store
1
0x110
Load
1
0x110
Store
2
0x110
Load
2
0x110
Store
Bus
Hit/Miss HIT/
transaction(s)
HITM
Proc 1
Proc 2
Address State
Address State
Set 0
Set 0
Set 1
Set 1
“4C” miss
type (if
any)
Software techniques for ILP
(1/2)
• Pipeline scheduling
– Reordering instructions in a basic block to remove
pipe stalls
– Loop unrolling
• Static information passed to processor
– Static branch prediction
– Static dependence information
• Loop issues
– Detecting loop dependencies
– Software pipelining
Software techniques for ILP
(2/2)
• Global code scheduling
– Predicated instruction and CMOV
– Memory reference speculation
– Issues with preserving exception behavior
• IA-64 as a case study of hardware support for
software ILP techniques
– Speculative loads
– Advanced loads
– Software pipelining optimizations
Software techniques for ILP: Questions
• What is the most significant disadvantage of
loop unrolling?
• Using CMOV re-write the following code
snippet, removing the branch. Don’t change
exception behavior and assume DIV only
causes an exception if R3=0
BNE R1 R2 skip
R1=R2/R3
skip: nop
Power
• Understand why it’s important
• Power vs. Energy
• How it’s related to the existence of multicore
• Understand voltage scaling issues
Power: questions
• How could a processor use less power but
more energy for a given task?
• Explain how power issues drove
multiprocessors to the desktop.
And some random questions
• Why is wake-up/select a critical part of an OoO
machine?
– What are the advantages and disadvantages of each of the
following:
• Have wakeup/select and execute take one cycle (total) for most
instructions
• Use speculative wakeup
– What other options do you have?
• How are memory dependencies different than register
dependencies?
– How do we resolve both?
– In what way do their differences demand different solutions?
Random questions
• Give an example where an LSQ could
double the CPI of a given piece of code
(on an OoO machine) where that code
contains only one load and one store.
• Our mantra has been “make the common
case fast”
– Where have we used that?