Defining Wakeup Width for Efficient Dynamic Scheduling – Binghamton University

Download Report

Transcript Defining Wakeup Width for Efficient Dynamic Scheduling – Binghamton University

Defining Wakeup Width for
Efficient Dynamic Scheduling
A. Aggarwal, O. Ergin – Binghamton University
M. Franklin – University of Maryland
Presented by: Deniz Balkan
Dynamic Scheduler
• Workings of a dynamic scheduler
– Wakeup dependent instructions
– Select instructions from a pool of ready instructions
• Both these operations form a critical path
• Increase of a single cycle in this critical path
impacts performance
Defining Wakeup Width for Efficient Dynamic Scheduling
A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Implications of a large Dynamic
Scheduler
• Large dynamic scheduler has the potential to
exploit more ILP
– Larger issue queue
– Larger issue width
• Implications
– Longer wire delays associated with driving register tags
– Longer wire delays in driving tag comparison results
– Longer select logic latency
• Overall increased scheduler latency, resulting
in slower clock speed
Defining Wakeup Width for Efficient Dynamic Scheduling
A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Contributions of this paper
• Wakeup width definition – effective number of
results used for instruction wakeup
– Usually equal to the issue width
• Reduced wakeup width dynamic scheduler
– Issue width remains the same
– Reduces instruction wakeup latency, energy
consumption, and area
– Less than 2% reduction in IPC
Defining Wakeup Width for Efficient Dynamic Scheduling
A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Program Behavior Study
• Not all instructions produce a result
– Branch and store instructions form about 30%
• Entire issue width of the processor not used
in every cycle
• Average number of tags generated per cycle
considerably less than the processor issue
width
Defining Wakeup Width for Efficient Dynamic Scheduling
A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Tags generated in a cycle
• To generate more tags per cycle, used a fetch, issue and commit width of 12
• Almost 50% of cycles have either 0 or 1 tag generated, even with a large
issue width
• About 80% of the cycles have 3 or less tags generated per cycle
Defining Wakeup Width for Efficient Dynamic Scheduling
A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Useful tags
• Not all the generated tags are immediately
useful
– Branch mispredictions lead to tags generated along
wrong path, and tags not immediately required
– Dependent instructions not present in issue queue or
waiting for other operands
• Average number of useful tags in a cycle
even less than the average number of tags
generated in a cycle
Defining Wakeup Width for Efficient Dynamic Scheduling
A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Useful tags
Only about 50-60% of instructions produce a tag that is immediately required
Defining Wakeup Width for Efficient Dynamic Scheduling
A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Reduced Wakeup Width
Dynamic Scheduler
• Wakeup width reduced while retaining the
issue width intact
– Some tags may have to wait before waking up the
dependent instructions
• Performance impact is not expected to be
high
– Soon there will be cycles with fewer tags
– Waiting tags can use the available wakeup slots
– Delays in not immediately useful tags may not have any
performance impact
Defining Wakeup Width for Efficient Dynamic Scheduling
A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Hardware Implementation –
Conventional DS
• Select logic decides which instruction
executes on which FU
• Register tags of issued instructions
placed in tag-latches
• Enable signals controlled to enable
the drivers that drive the tags across
the instruction window
Defining Wakeup Width for Efficient Dynamic Scheduling
A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Hardware Implementation –
RWW DS
• Wakeup width reduced to half the
issue width
• Two tag latches/FUs share common
tag-lines
• If both tag-latches hold tags, only one
of them is driven, the other remains
in the tag-latch
• To prevent overwriting, 1-bit indicator
latch used to control the selection
process
Defining Wakeup Width for Efficient Dynamic Scheduling
A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
FU arbiter
• Decides the instruction to be executed on the
FU
• Conventional arbiter giving priority to oldest
instruction
Grant1 = req0 AND req1 AND enable
• Arbiter with RWW dynamic scheduler, where
“a” is the value of the indicator latch for the
arbiter
Grant1 = req0 AND a AND req1 AND enable
Defining Wakeup Width for Efficient Dynamic Scheduling
A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Experimental Setup
• Simulator based on Simplescalar to collect
the performance statistics
• Delay, energy, and area estimation from the
actual VLSI layouts using SPICE, in a 0.18
micron 6 metal layer CMOS process (TSMC)
• Dynamic scheduler size – 128-entry issue
queue, 6-way issue width
Defining Wakeup Width for Efficient Dynamic Scheduling
A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Performance Results
• Compared to I6W6 (Issue Width 6, Wakeup
Width 6) configuration
– I6W3 has 15% lower wakeup logic latency
• IPC impact about 5% for I6W3
– Higher for high IPC FP benchmarks
– Significantly better than I3W3, with the same wakeup logic
latency as I6W3
Defining Wakeup Width for Efficient Dynamic Scheduling
A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
IPC of FP benchmarks with
RWW
Reasons of IPC impact
• Instructions delayed due to waiting tags
• Issue slots wasted because of waiting tags
Defining Wakeup Width for Efficient Dynamic Scheduling
A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Reasons of IPC impact
• Delayed register tags have more impact than issue slot wastage
• With reducing wakeup width, the impact of delayed register tags increases
dramatically
Defining Wakeup Width for Efficient Dynamic Scheduling
A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Area and Energy Results
• Activation statistics obtained through
simulations, and the energy consumption
values from our detailed layouts
– I6W3 reduced wakeup logic energy consumption by 10%
• Area of the CAM cells (tag part of the
instruction window) reduces by about 30% for
I6W3
Defining Wakeup Width for Efficient Dynamic Scheduling
A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Reduced Issue Slots Wastage
(RWIS)
• Issue slots wasted because no instructions
issued to FUs with already waiting tags
• Classified instructions into
– Tag-producing instructions
– Non-tag-producing instructions
• Can still issue non-tag-producing instructions
to FUs with waiting tags without overwriting
the tag value
• Type bit included with the instruction to
control issue
Defining Wakeup Width for Efficient Dynamic Scheduling
A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Reduced Tag Delays (RTD)
• Register tags delayed when multiple tagproducing instructions issued to the FUs
sharing the tag-lines (FU-group)
• RTD limits the number of tag-producing
instructions issued to an FU-group
– Waiting tags of the previous cycle used for this purpose
• Non-tag-producing instructions can still be
issued to FUs with indicator bits set
Defining Wakeup Width for Efficient Dynamic Scheduling
A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Enhanced Performance
• RTD-1 (with a maximum of 1 waiting tag) is the most effective
• RWIS reduces the wastage of issue slots, RTD also reduces waiting register tags
• RTD-2 results in more instructions getting delayed (compared to RTD-1) due to
waiting register tags
Defining Wakeup Width for Efficient Dynamic Scheduling
A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Conclusions
• Larger dynamic schedulers can exploit more ILP, thus
increasing performance
• Larger dynamic scheduler results in longer scheduler
latency
• Reduced wakeup width (RWW) dynamic scheduler
exploits the property that the number of useful tags
generated per cycle are significantly less than the
issue width
• Significant reduction in wakeup logic latency and
dynamic scheduler area and energy consumption with
minimal IPC impact
Defining Wakeup Width for Efficient Dynamic Scheduling
A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland