Project Presentation

Download Report

Transcript Project Presentation

Fetch Directed Prefetching
- a Study
CS 752 Project
Gokul Nadathur
Nitin Bahadur
Sambavi Muthukrishnan
Gokul, Nitin, Sambavi
1
Motivation

Execution engine limited by fetch bandwidth




effect of memory latency on fetch
correlation between i-cache stalls and branch
predictor
rate at which branch predictor and BTB can be
cycled
With increase in ILP, there is a need to
increase fetch performance
Gokul, Nitin, Sambavi
2
Fetch Directed Architecture
Prefetch
Instruction Queue
L2 Cache
Prefetch
Branch
Predictor
Prefetch Filtration
Mechanism
Prefetch
Buffer
Instruction
Fetch
Fetch Target
Buffer
Fetch Target Queue
Gokul, Nitin, Sambavi
3
Decoupled Branch Predictor





has its own PC
runs independent of fetch pipeline
stage
makes a prediction each cycle
unaffected by i-cache stalls
Problem!!!

May not have updated branch history
Gokul, Nitin, Sambavi
4
Fetch Target Buffer and Fetch
Target Queue
Fetch Target Buffer



Stores fall through and target address for taken
branches
Accessed with a prediction from branch predictor
each cycle
Fills in single/multiple cache line blocks into FTQ
Fetch Target Queue


Contains blocks of instruction addresses to be next
executed
FTQ entries are dequeued by fetch engine
Gokul, Nitin, Sambavi
5
Prefetch Filter and Prefetch
Instruction Queue
Prefetch Instruction Queue


Contains queue of cache blocks to be prefetched
Prefetch mechanism dequeues PIQ and performs the
prefetching
Prefetch Filter


Takes entries from FTQ, filters them and inserts them
into PIQ
Enables intelligent prefetching !
Gokul, Nitin, Sambavi
6
Stream Buffers
L1 I-cache
L2 I-cache
Tag and comparator
Stream buffer
Cache block
Tag
Cache block
Tag
Cache block
Tag
Cache block
Head
Tail
FIFO
Gokul, Nitin, Sambavi
7
Prefetching in the Fetch
Directed Architecture


Similar to stream buffers
Addresses given by PIQ
Gokul, Nitin, Sambavi
8
Simulation Results
1.9
1.8
1.7
1.6
1.5
1.4
1.3
1.2
1.1
1
0.9
0.8
No prefetching
x
vo
rte
li
rl
pe
m
pr
es
s
Stream Buffers w/ FIFO
co
IPC
Stream Buffers Prefetching Performance
Benchmarks
Gokul, Nitin, Sambavi
9
Simulation Results
Study of stalled i-fetches due to prefetch
% of total cycles
20
18
16
14
12
Cycles lost due to
harmful prefetch
10
Cycles gained due to
prefetch hits
8
6
4
2
0
compress
li
vortex
perl
Benchmark
Gokul, Nitin, Sambavi
10
Simulation Results
IPC Speedup vs Number of Stream Buffers (FIFO)
% Speedup over no Stream
Buffer
6
5
4
perl
li
3
vortex
compress
2
1
0
2
4
8
16
Number of Stream Buffers
Gokul, Nitin, Sambavi
11
Simulation Results
BTB vs FTB
1.2
1
IPC
0.8
BTB
0.6
FTB
0.4
0.2
0
li
perl
vortex
Benchmark
Gokul, Nitin, Sambavi
12
Simulation Results
Improvement in IPC using fetch directed prefetching
% IPC increase
10
8
6
IPC improvement
4
2
0
compress
li
perl
vortex
Benchmark
Gokul, Nitin, Sambavi
13
Simulation Results
Improvement in IPC using different techniques
% IPC improvement
10
8
Stream Buffers over No
Stream Buffers
6
Fetch Directed
Prefetching over no
prefetching
4
2
0
compress
li
perl
vortex
Benchmark
Gokul, Nitin, Sambavi
14
Conclusions



Prefetching definitely helps
Fetch directed architecture aids
prefetching
Optimal results require sophisticated
memory hierarchy
Gokul, Nitin, Sambavi
15