The Alpha 21364 Network Architecture Shubhendu S. Mukherjee, Peter Bannon, Steven Lang, Aaron Spink, and David Webb Alpha Development Group, Compaq HOT Interconnects 9 (2001) Presented.

Download Report

Transcript The Alpha 21364 Network Architecture Shubhendu S. Mukherjee, Peter Bannon, Steven Lang, Aaron Spink, and David Webb Alpha Development Group, Compaq HOT Interconnects 9 (2001) Presented.

The Alpha 21364
Network Architecture
Shubhendu S. Mukherjee, Peter Bannon, Steven Lang,
Aaron Spink, and David Webb
Alpha Development Group, Compaq
HOT Interconnects 9 (2001)
Presented by John Ingalls
ECE 259 - March 22, 2010
Summary of Features





Alpha 21364 is a 21264 core plus 1.75MB L2
on-die cache, 2-channel Rambus DRAM, I/O
controller, and router at 1.2GHz on 180nm.
Up to 128 processors in a system. All can
access others’ memory and I/O.
Directory cache coherence protocol.
2-D Torus interconnection network with
adaptive routing and deadlock-free fallback.
Request packets generally 3 flits in size, data
packets generally 18 flits in size. Flits have ECC.
Notable Features: Network Routing



Network is 2-D Torus.
Virtual Cut-Through Routing: Blocked packet’s
flits will accumulate in buffer.
Adaptive Routing: Minimum rectangle. Source
picks either dimension to send on, algorithm
then prefers to keep packets on that dimension.
Fig. 3:
(pg. 2)
Notable Features: Deadlock Avoidance




Avoiding Coherence Deadlock: Separate virtual
channels for responses and requests.
Preserving I/O Consistency: Same class must be
in same virtual channel, thus same route, thus
retain order in that class (i.e. read or write).
3 Virtual Channels per Dimension per Class:
Adaptive, VCO, and VC1.
Adaptive for bulk of traffic, VC0 and VC1 are
fixed-route deadlock-free “drain” for blocked
adaptive packets.
Notable Features: Deadlock Avoidance
Fig. 5:
(pg. 3)

VC0 and VC1 mapped at boot time to prohibit
cyclic dependency. Packets on VC0/1 can only
turn if they are at corner of minimum rectangle;
Adaptive virtual channel has no such restriction.
Packets can return to adaptive if non-congested.
Technical Details: Router Architecture






13 cycles pin-to-pin, any input to any output.
Pipeline clocked at 1.2GHz, links at 800MHz.
Link clock sent with outgoing packet.
ECC recomputed at every hop. 1-bit recoverable.
Arbitration: Input “local” arbiters show a packet
that is ready and not blocked from buffer to
“global” arbiters for possible dispatch. Output
global arbiters select from input local arbiters.
Least-Recently-Selected selection policy. Also,
Rotary Rule prioritizes older packets from
network. Coherence Dependence Priority rule.
Good


This was built and
shipped (albeit late),
which immediately
lends it credibility.
Simple introduction to
interconnection
networks: 5 pages
makes the authors
explain everything
clearly and concisely.
Bad




No evaluation of
performance.
No comparison against
competitors (“ours is
better” would help sales).
Configuration around
faulty routers is
mentioned but never
explained.
5 pages isn’t enough to
explain the edge cases.
Conclusion / Further Questions


Keywords: 2-D torus. Adaptive routing with
deadlock-free fixed-route virtual channels to
prevent network deadlock. Separate virtual
channels for requests and responses to
prevent coherence deadlock.
This was the last major iteration of the Alpha
architecture. Why? What competing product
replaced it? How was that competitor better?
How could the 21364 have been improved to
stay competitive (features, performance)?