Project Presentation
Download
Report
Transcript Project Presentation
Fetch Directed Prefetching
- a Study
CS 752 Project
Gokul Nadathur
Nitin Bahadur
Sambavi Muthukrishnan
Gokul, Nitin, Sambavi
1
Motivation
Execution engine limited by fetch bandwidth
effect of memory latency on fetch
correlation between i-cache stalls and branch
predictor
rate at which branch predictor and BTB can be
cycled
With increase in ILP, there is a need to
increase fetch performance
Gokul, Nitin, Sambavi
2
Fetch Directed Architecture
Prefetch
Instruction Queue
L2 Cache
Prefetch
Branch
Predictor
Prefetch Filtration
Mechanism
Prefetch
Buffer
Instruction
Fetch
Fetch Target
Buffer
Fetch Target Queue
Gokul, Nitin, Sambavi
3
Decoupled Branch Predictor
has its own PC
runs independent of fetch pipeline
stage
makes a prediction each cycle
unaffected by i-cache stalls
Problem!!!
May not have updated branch history
Gokul, Nitin, Sambavi
4
Fetch Target Buffer and Fetch
Target Queue
Fetch Target Buffer
Stores fall through and target address for taken
branches
Accessed with a prediction from branch predictor
each cycle
Fills in single/multiple cache line blocks into FTQ
Fetch Target Queue
Contains blocks of instruction addresses to be next
executed
FTQ entries are dequeued by fetch engine
Gokul, Nitin, Sambavi
5
Prefetch Filter and Prefetch
Instruction Queue
Prefetch Instruction Queue
Contains queue of cache blocks to be prefetched
Prefetch mechanism dequeues PIQ and performs the
prefetching
Prefetch Filter
Takes entries from FTQ, filters them and inserts them
into PIQ
Enables intelligent prefetching !
Gokul, Nitin, Sambavi
6
Stream Buffers
L1 I-cache
L2 I-cache
Tag and comparator
Stream buffer
Cache block
Tag
Cache block
Tag
Cache block
Tag
Cache block
Head
Tail
FIFO
Gokul, Nitin, Sambavi
7
Prefetching in the Fetch
Directed Architecture
Similar to stream buffers
Addresses given by PIQ
Gokul, Nitin, Sambavi
8
Simulation Results
1.9
1.8
1.7
1.6
1.5
1.4
1.3
1.2
1.1
1
0.9
0.8
No prefetching
x
vo
rte
li
rl
pe
m
pr
es
s
Stream Buffers w/ FIFO
co
IPC
Stream Buffers Prefetching Performance
Benchmarks
Gokul, Nitin, Sambavi
9
Simulation Results
Study of stalled i-fetches due to prefetch
% of total cycles
20
18
16
14
12
Cycles lost due to
harmful prefetch
10
Cycles gained due to
prefetch hits
8
6
4
2
0
compress
li
vortex
perl
Benchmark
Gokul, Nitin, Sambavi
10
Simulation Results
IPC Speedup vs Number of Stream Buffers (FIFO)
% Speedup over no Stream
Buffer
6
5
4
perl
li
3
vortex
compress
2
1
0
2
4
8
16
Number of Stream Buffers
Gokul, Nitin, Sambavi
11
Simulation Results
BTB vs FTB
1.2
1
IPC
0.8
BTB
0.6
FTB
0.4
0.2
0
li
perl
vortex
Benchmark
Gokul, Nitin, Sambavi
12
Simulation Results
Improvement in IPC using fetch directed prefetching
% IPC increase
10
8
6
IPC improvement
4
2
0
compress
li
perl
vortex
Benchmark
Gokul, Nitin, Sambavi
13
Simulation Results
Improvement in IPC using different techniques
% IPC improvement
10
8
Stream Buffers over No
Stream Buffers
6
Fetch Directed
Prefetching over no
prefetching
4
2
0
compress
li
perl
vortex
Benchmark
Gokul, Nitin, Sambavi
14
Conclusions
Prefetching definitely helps
Fetch directed architecture aids
prefetching
Optimal results require sophisticated
memory hierarchy
Gokul, Nitin, Sambavi
15