Transcript PPT

OOO vs. EPIC
Yingmin Li
Ting Yan
Qi Zhao
Outline



“Advantages” of EPIC
Critique
Conclusion
EPIC: Main Idea


“Smart compiler, dumb machine”
Finding parallelism
– Processor  compiler
– Software/hardware synergy

Processor design
– Avoid complexity and difficulty

ILP, SMT & CMP
EPIC: Predication




In OOO: dynamic branch prediction.
Larger basic blocks.
Control dep.  Data dep.
Eliminate misprediction & penalties.
EPIC: Speculation




OOO: dynamic hardware
Data speculation & control speculation
Bigger window
Reduce impact of memory latencies
EPIC: Large Register Set



OOO: register renaming.
Easier to design than reg. Renaming.
“Real” registers benefits some apps.
– Encryption alg., Numerical alg.

Avoid loss of invisible registers.
– Interruptions in OOO.
EPIC: Unique Features

Register Stack Engine (RSE).
– To deal with call/ return costs.
– Seems an unlimited stack of phys. Reg.

Rotating register file.
– Software pipelining.
• Multiple loops at the same time.
Function Call

Register saving/restoring
– Processor?
– Compiler?

Register file
– Expensive
– Always idle
Predication




Computation of the branch condition is
on the critical path
Increase ICache footprint
Half of the functional units effectively
used if both “then” and “else” are
scheduled
Hard to implement out-of-order with full
predication
Predication
To compute if (a) x = t+1:
Control Speculation


Why not just use prefetch which will not
cause unexpected exception?
Technique to exploit control speculation
such as superblock increase code
length
Control prediction
Data Speculation

Moving a load above a possibly
conflicting store
– An advanced load and a checking load
(IA64)
– A run-time predictor
Data speculation
Software Pipelining

For high performance technical
computing
– High trip-count loops

For commercial applications
– Low trip-count loops
EPIC: at least not a breakthrough

Design Object of EPIC:
– Moving hardware complexity to compiler
EPIC: at least not a breakthrough

The failure of EPIC:
– The compiling technique used for EPIC
almost also apply well to OOO
– Hardware simplicity is not so obvious to
offset EPIC’s overhead
– Without dynamic information, compiler
essentially can’t do sth well enough
The tragedy of cycle time

Why no obvious improvement in cycle
time
– mechanisms like RSA increase die
complexity
– Compare and dependent branch in one
cycle
– Predicted execution dependent on the
existence of many function units
Dynamic path length: hey, IA64,
you wasted too much here






Speculation
Half of the predicted instructions
discarded
Restricted bundling
One base register
No sign-extended loads
No integer multiply or divide in general
register
CPI



No dynamic prediction
Longer source code (more GR,
Predicate register, template bit,
restricted bundling, recovery code) is
burdensome for instruction fetching
Recovery code may induce ICache
pollution or just a page-fault