Transcript PPT

OOE vs. EPIC
Emily Evans
Prashant Nagaraddi
Lin Gu
Objective

Our objective is to evaluate the claims
and counterclaims about OOE and
EPIC made in:
– “Is Out-of-Order Out of Date?” by
William S. Worley and Jerry Huck
– “A Critical Look at IA-64” by Martin
Hopkins
Outline
Analysis of ILP
 Analysis of Code Size
 Analysis of Hardware Complexity
 Analysis of Compiler Complexity
 Analysis of Power Consumption
 Comparison Methodology
 Conclusion

What is EPIC?
“One of our goals for EPIC was to retain VLIW's philosophy
of statically constructing the POE, but to augment it with
features, akin to those in a superscalar processor, that
would permit it to better cope with these dynamic factors.
The EPIC philosophy has the following key aspects to it.”
“Providing the ability to design the desired POE at compiletime.”
“Providing features that permit the compiler to "play" the
statistics.”
“Providing the ability to communicate the POE to the
hardware.”
*From EPIC: An architecture for instruction-level parallel processors by Michael S.
Schlansker and B. Ramakrishna Rau.
Analysis of ILP

MH: Hardware provides good ILP because it
dynamically adjusts the instruction schedule based
on the actual execution path and cache misses, with
the use of:
–
–
–
–

Large reorder buffers
Register renaming
Branch prediction
Alias detection
WW & JH: Compiler can exploit ILP more effectively
with the use of:
– Massive resources -- large register set, more function units
– Predication
– Speculation
Analysis of ILP (cont.)

Our observation:
– From H&P book:


The SPECint benchmark shows that the Alpha 21264
and Pentium 4 considerably outperform the Itanium .
The SPECfp benchmark shows that the Itanium slightly
outperforms the Alpha 21264 and Pentium 4.
– These diagrams are not an absolute measurement
of the performance of OOE and EPIC.


A different implementations of the architectures may
perform differently.
As EPIC compilers improve over time, these
performance figures will change.
Analysis of Code Size

MH: Code size for IA-64 could be as much as
4 times that of x86 to perform the same work.

WW & JH: Code size will be larger, but the
instruction stream will contain fewer
branches. Also, there are mechanisms to
efficiently deliver instructions to the
processor.
Analysis of Code Size (cont.)

Our observation:
– Both sides agree that code size increases overall,
however they disagree on the extent to which it
affects performance.
– EPIC code size will expand dramatically in some
cases.
– EPIC code size can also be smaller than OOE
code size in some cases.
– We expect that a mature optimizing compiler will
be able to deliver code with reasonable size and,
after all, code size doesn’t necessarily reflect
performance loss linearly.
Analysis of Hardware
Complexity

MH: To support features for greater ILP, EPIC
hardware will be quite complex.
– Predication requires more functional units
– NaT bits to allow deferring exceptions
– ALAT to allow loads before stores

WW & JH: IA-64 makes the hardware less
complex because it is not responsible for
detecting and scheduling the parallelism.
– Reorder buffer, register renaming, etc
Analysis of Hardware
Complexity (cont.)

Our observation:
– Is EPIC processor more complex than OOE
processor?

Example: Alpha 21264, two stages fewer (but more
stages don't necessarily mean more complexity)
– As mentioned in H&P book, good techniques in
‘enemy camp’ are often borrowed. EPIC
processors are expected to be simple. However,
to support better ILP, they will also invoke
hardware support, which makes them more
complex than expected.
Analysis of Compiler
Complexity

MH: It is very difficult to write a good EPIC compiler.
Profiling is also a burden:
– Not welcomed by programmers
– Hard to get and maintain a test suite
– Formidable task for large programs

WW & JH: OOE compilers are difficult to write as
well.
– OOE processors still need good compilers to ensure
performance gains.
– OOE compiler writers must understand the limitations of the
hardware and figure out how to work around them.
– Code profiling is only “slightly” more important for EPIC
processors.
Analysis of Compiler
Complexity (cont.)

Our observation:
– Optimizing compiler can help performance for both
OOE and EPIC processors.
– Profiling, which is a non-trivial task, adds
complexity to compiler.
– An EPIC compiler has a much more responsibility
than an OOE compiler, so it is likely to be more
complex.
– The EPIC philosophy aims to trade compiler
complexity for hardware simplicity. Whether this is
a critical disadvantage must be considered in the
context of overall system complexity and
performance.
Analysis of Power
Consumption

MH: Massive resources consume lots of
power.
– “Thus, IA-6 gambles that, in the future, power will
not be the critical limitation, …”

WW & JH: They left this issue out, perhaps
because they do not think it is a big problem.
Analsysis of Power
Consumption (cont.)

Our observation:
– The use of massive resources is likely to consume
more power.
– Whether or not this will be a problem depends on
the aimed application area of the EPIC technology.


For servers and high-end workstations, the power
consumption is not as important.
For embedded systems, power consumption is likely a
very critical issue.
EPIC really to be a ‘general purpose’
technology, power consumption control must be
considered.
– For
Comparison Methodology

MH: Accumulating “facts” supporting a skeptical view
of EPIC.
– Example: EPIC stalls when OOE proceeds

WW & JH: Accumulating “facts” supporting an
optimistic view of EPIC.
– Example: Dynamic translation

Architecture design is a balance of CPI, frequency,
instruction count, application limitation, and cost.
There are always cases and countercases for every
solution. They need to be considered in an integrated
context.
Comparison Methodology
(cont.)

EPIC stalls when OOE proceeds
– This will happen in some cases.
– But, we must determine how this case actually
hurts performance.



Cache miss is not a common case.
Speculation makes this case even less common.
In cache miss, OOE is also not expected to proceed far
enough.
Comparison Methodology
(cont.)

Dynamic translation
– It rarely gives much performance gain with highly
optimized code.
– Dynamo example:
Conclusion
– Both
authors make claims about the EPIC
architecture without providing any quantitative
evidence.
– Quantitative evidence is necessary to conclude
that one architecture is superior to another.
– EPIC is a useful effort in the exploration of higher
ILP.


When evaluating it, we need to isolate the usefulness of
the architectural approach from a single specific
implementation of it.
The idea behind EPIC is good, but more time, effort, and
calm calculation are needed to know whether it works.