Quinn Gaumer Duke University

Download Report

Transcript Quinn Gaumer Duke University

Quinn Gaumer
Duke University
Outline
 Motivation
 Previous SC optimizations
 SC++ Implementation
 SC++ Analysis
SC vs. RC
 SC
 Easy Programming
Model (No different
than uniprocessor)
 Slower programs
 RC
 Faster programs(20%)
 Software Assistance
Previous SC optimizations
 Load forwarding
 Loads can return values even if other mem ops are
pending
 When is this good?

As long as its not exposed to other processors
 When is wrong?

Invalidations received before the speculative load is retired.
 Problem: ROB can still fill up due to store at head…
Previous SC optimizations
 Store Buffering
 Waiting Stores moved to LSQ
 Problem: reorder buffer still must stop retiring loads if
stores are pending.
SC++
 Store-Store bypassing
 Speculative State for Memory
 Speculation Support
 Rollbacks infrequent
Store-Store Bypassing
 Speculative History
 SHiQ
Queue
 Holds Stores and
completed instructions
 Also holds information
needed to rollback
operations
Store
Head
OP
Store
Head
Memory Order Violations
 When is SC Violated?
 Speculative load or store is invalidated,
read, or read.
 How is Violation Detected?
 Block Lookup Table(BLT) contains
addresses of speculative memory ops
 Invalidations, Replacements,
Downgrades cause search of BLT for
address
Rollback
 Processor and Memory state must be rolled back to
first memory operation that accessed offending block
 Guarantee Forward Progress?
 Speculation prohibited until all pending stores
performed.
 Rollback can be slow
 Requires flushing pipeline, move data between local
caches.
 Optimizations
 Rollback multiple instructions/cycle
 Sending responses to invalidations immediately
Qualitative Analysis
 Must hold state to allow roll back of both processor
and memory
 Detect rollbacks quickly
 Rollbacks are extremely slow…Does it matter?
 Data Races
 False Sharing
 Cache Conflicts
Results
 SC++ theoretically performs as well as RC
 SC++ can be physically limited in several ways
 Network Latency
 SHiQ Size
 Cache Size
 Why does each affect the speed of SC++(relative to SC or
RC)