Firefly: Illuminating Future Network-on

Download Report

Transcript Firefly: Illuminating Future Network-on

Firefly: Illuminating Future
Network-on-Chip with Nanophotonics
Yan Pan, Prabhat Kumar, John Kim†,
Gokhan Memik, Yu Zhang, Alok Choudhary
EECS Department
Northwestern University
Evanston, IL, USA
{panyan,prabhat-kumar,g-memik,
yu-zhang,a-choudhary}
@northwestern.edu
†
CS Department
KAIST
Daejeon, Korea
[email protected]
On-Chip Network Topologies
Mesh
[MIT RAW] [TILE64]
[Teraflops]
C-Mesh
[Balfour’06]
[Cianchetti’09]




Motivation
Motivation
Architecture of Firefly
Evaluation
Conclusion
Crossbar
[Vantrease’08]
[Kirman’06]
Others: Torus[Shacham’07], Flattened Butterfly[Kim’07], Dragonfly[Kim’08],
Hierarchical(Bus&Mesh)[Das’08], Clos[Joshi’09], Ring[Larrabee], ……
►
Network-on-chip is critical for performance.
Yan Pan
ISCA 2009
2/25
Signaling technologies
►




Motivation
Motivation
Architecture of Firefly
Evaluation
Conclusion
Electrical signaling
– Repeater insertion needed
– Bandwidth density (up to 8 Gbps/um) [Chang HPCA‘08]
►
Nanophotonics
– Bandwidth density ~100 Gbps/ μm !!! [Batten HOTI’08]
– Generally distance independent power consumption
– Speed of light  low latency
• Propagation
• Switching [Cianchetti ISCA’09]
Yan Pan
ISCA 2009
3/25
Nanophotonic components
resonant
detectors
coupler




Motivation
Motivation
Architecture of Firefly
Evaluation
Conclusion
Ge-doped
waveguide
off-chip
laser
source
resonant
modulators
►
Basic components
Yan Pan
ISCA 2009
4/25




Resonant Rings
►
Motivation
Motivation
Architecture of Firefly
Evaluation
Conclusion
Radius r  Baseline Wavelength
Temperature t  Manufacturing error correction
Carrier density d  Fast tuning by charge injection
Selective
– Couple optical energy of a specific wavelength
Yan Pan
ISCA 2009
5/25




Putting it together
10001011
Motivation
Motivation
Architecture of Firefly
Evaluation
Conclusion
11010101
64 wavelengths DWDM
3 ~ 5μm waveguide pitch
10Gbps per link
10001011
►
11010101
Modulation & detection
~100 Gbps/μm
bandwidth density
– ~100 Gbps/μm bandwidth density [Batten HOTI’08]
Yan Pan
ISCA 2009
6/25




What’s the catch?
Power Cost
–
–
–
–
►
Ring heating
Laser Power
E/O & O/E conversions
Distance insensitive
For long
shortlinks
links (2.5mm)
–
–
–
Nanophotonics
• Cost stays the same
Electrical
Electrical
• RC lines with repeater
Per Bit Energy (fJ/b)
►
Motivation
Motivation
Architecture of Firefly
Evaluation
Conclusion
700Optical Components
Ring Heating
600Laser
Electrical
500
400
300
200
100
0
• insertion
Cost increases
Nanophotonics
[Batten HOTI’08]
Yan Pan
ISCA 2009
RC Line
[Cheng ISCA’06]
7/25
Here is the idea ……
►




Motivation
Motivation
Architecture of Firefly
Evaluation
Conclusion
Design an architecture that differentiates traffic.
– Use electrical signaling for short links.
– Use nanophotonics only for long range traffic.
►
What do we gain?
–
–
–
–
–
Low latency
High bandwidth density
High power efficiency
Localized arbitration
Scalability
Yan Pan
ISCA 2009
8/25




Outline
►
►
►
►
Motivation
Architecture
Firefly
Architecture of of
Firefly
Evaluation
Conclusion
Motivation
Architecture of Firefly
Evaluation
Conclusion
Yan Pan
ISCA 2009
9/25




Layout View of 64-core Firefly
►
Concentration
– 4 cores share a
router
– 16 routers
P1 P0
P0
R
P0
R
P1 P0
R
P1
R
P2
P3 P2
P3
P2
P3 P2
P3
P0
P1 P0
P1
P0
P1 P0
P1
R
R
R
R
P2
P3 P2
P3
P2
P3 P2
P3
P0
P1 P0
P1
P0
P1 P0
P1
R
R
R
R
P2
P3 P2
P3
P2
P3 P2
P3
P0
P1 P0
P1
P0
P1 P0
P1
R
P2
Yan Pan
P1
Motivation
Architecture
Firefly
Architecture of of
Firefly
Evaluation
Conclusion
R
P3 P2
ISCA 2009
R
P3
P2
R
P3 P2
P3
10/25




Layout View of 64-core Firefly
►
►
Concentration
Clusters
– Electrically
connected
– Mesh topology
– 4 routers per
cluster
– 4 clusters
R
C0R0
R
C1R0
Cluster 0
(C0)
R
C1R1
Cluster 1
(C1)
C0R2
R
C0R3
R
C1R2
R
C1R3
R
C2R1
R
C3R0
R
C3R1
C2R0
Cluster 2
(C2)
R
C2R2
Yan Pan
R
C0R1
Motivation
Architecture
Firefly
Architecture of of
Firefly
Evaluation
Conclusion
ISCA 2009
R
R
Cluster 3
(C3)
R
C2R3
R
C3R2
R
C3R3
11/25




Layout View of 64-core Firefly
►
►
►
Concentration
Clusters
Assemblies
– Routers from
different clusters
– Optically
connected
– Logical
crossbars
Yan Pan
C0R0
C0R1
A0
C0R2
C0R3
A2
C1R0
Motivation
Architecture
Firefly
Architecture of of
Firefly
Evaluation
Conclusion
C1R1
A1
C1R2
C1R3
A3
C2R0
C2R1
C3R0
C3R1
C2R2
C2R3
C3R2
C3R3
ISCA 2009
12/25




Layout View of 64-core Firefly
►
Clusters
– Electrical
CMESH
►
Motivation
Architecture
Firefly
Architecture of of
Firefly
Evaluation
Conclusion
Assemblies
C0R0
A0
C0R1
A1
C1R0
A2
C1R1
A3
C0R2
C0R3
C1R2
C1R3
C2R0
C2R1
C3R0
C3R1
– Nanophotonic
crossbars
Efficient nanophotonic
crossbars needed!
Yan Pan
C2R2
ISCA 2009
Nanophotonic
Crossbars
C2R3
C3R2
C3R3
13/25




Nanophotonic crossbars
►
Motivation
Architecture
Firefly
Architecture of of
Firefly
Evaluation
Conclusion
Single-Write-Multiple-Read (SWMR) [Kirman’06] (CMXbar†)
–
–
–
–
† [Joshi NOCS’09]
Dedicated sending channel
Multicast in nature
Receiver compare & discard
High fan-out  laser power
R0
...
...
...
RN-1
...
...
CH0
...
...
...
CH1
Data
Channels
w
w
w
R1
CH(N-1)
SWMR Crossbar
Yan Pan
ISCA 2009
14/25




Nanophotonic crossbars
►
Motivation
Architecture
Firefly
Architecture of of
Firefly
Evaluation
Conclusion
Multiple-Write-Single-Read (MWSR)[Vantrease’08] (DMXbar†)
– Dedicated receiving channel
– Demux to channel
– Global arbitration needed!
† [Joshi NOCS’09]
R0
...
...
...
RN-1
...
...
CH0
...
...
...
CH1
Data
Channels
w
w
w
R1
CH(N-1)
MWSR Crossbar
Yan Pan
ISCA 2009
15/25




Reservation-assisted SWMR
►
Goal
– Avoid global arbitration
– Reduce power
log (Ns)
log (Ns)
log (Ns)
– Reservation channels
R0
• Narrow
w
– Uni-cast data packet
R1
...
...
RN-1
...
...
CH0
...
• Destination ID
• Packet length
... CH0a
...
CH1
...
Data
Channels
– Multicast to reserve
...
...
...
w
w
CH(N-1)a
CH1a
...
Proposed design
Yan Pan
...
...
Reservation
Channels
►
Motivation
Architecture
Firefly
Architecture of of
Firefly
Evaluation
Conclusion
CH(N-1)
R-SWMR Crossbar
ISCA 2009
16/25




Router Microarchitecture
Motivation
Architecture
Firefly
Architecture of of
Firefly
Evaluation
Conclusion
VC
Allocator
Routing
Router
computation
Switch
Allocator
VC 1
Separate receiving
channels from
other clusters.
Inject
(Input 1)
VC 2
Eject
(Output 1)
VC v
VC 1
Input k
VC 2
Output k
VC v
global input 1
global input g
global
output
O/E
Crossbar
switch
O/E
input buffer
Arbiter
►
E/O
Dedicated
sending channel
for all traffic.
Virtual-channel router
– Added optical link ports and extra buffer.
Yan Pan
ISCA 2009
17/25
Routing




(FIREFLY_dest)
FIREFLY_dest
C0
RT
R0
head
RT
body
RB
C5
R0
C5
R1
LTC5
R2
FIREFLY_src
C5
OA
R3
RB
LT
LT
LT
LT
LT
OA
RT
LT
RT
LT
RT
LT
RT
RT
--
LT
LT
LT
LT
LT
OA
RT
LT
RT
LT
RT
LT
RT
RT
--
LT
LT
LT
LT
LT
OA
RT
LT
RT
LT
RT
LT
tail
Motivation
Architecture
Firefly
Architecture of of
Firefly
Evaluation
Conclusion
RT
log (Ns)
...
log (Ns)
log (Ns)
...
Switch
Allocator
VC 1
...
Inject
(Input 1)
VC 2
CH(N-1)a
CH1a
0a
Eject
... CH
(Output 1)
...
Routing
►
VC
Allocator
Reservation
Channels
Routing
Router
computation
...
VC v
VC 1
Input k
VC 2
R
... R
– Intra-cluster
routing
...
... ...
...
– Traversing
optical
link
CH
R0
1
N-1
Output k
VC v
global input 1
Crossbar
switch
O/E
input buffer
Arbiter
...
w
...
...
0
CH1
E/O
Data
Channels
global inputw
g
w
global
output
O/E
CH(N-1)
Yan Pan
ISCA 2009
18/25




Firefly – another look
Clusters
P
P
P
P
P
P
P
P
P
P
P
P
C0
P
P
C0R2
P
C0R3
P
P
C1
C1R0
C2R0
P
P
P
P
C0R2
P
P
P
C0R3
P
P
P
C1R0
P
P
P
C2
P
A3
P
P
P
P
ISCA 2009
A2
P
C0R1
C0
C1
P
P
Traffic locality
C3R0
C3
Reduced hardware
Localized arbitration
Distributed inter-cluster bandwidth
Yan Pan
P
P
C2
A1
C0R0
...
C0R1
P
...
C0R0
P
A0
C2R0
P
P
P
...
–
–
–
–
P
P
Assemblies
Benefits
P
P
– Long nanophotonic links
– Partitioned crossbars
►
P
P
...
►
P
...
– Short electrical links
– Concentrated mesh
...
►
Motivation
Architecture
Firefly
Architecture of of
Firefly
Evaluation
Conclusion
P
C3
C3R0
P
P
P
19/25




Outline
►
►
►
►
Motivation
Architecture of Firefly
Evaluation
Evaluation
Conclusion
Motivation
Architecture of Firefly
Evaluation
Conclusion
Yan Pan
ISCA 2009
20/25




Evaluation Setup
Code Name
Electrical
CMESH
Topology
Global Routing
Concentrated mesh
dimension-ordered routing
Minimal routing, traversing
nanophotonics at most once.
DFLY_MIN
Hybrid
Optical
Hybrid
►
►
►
Dragonfly topology mapped to
Nonminimal routing,
on-chip network
DFLY_VAL
traversing nanophotonics up
to twice.
Min
#VC
1
2
3
OP_XBAR
All-optical crossbar using tokendestination-based routing
based global arbitration
1
FIREFLY
Proposed hybrid architecture
with multiple logical optical
inter-cluster crossbar.
1
Intra-cluster routing in the
source cluster before
traversing nanophotonics
Motivation
Architecture of Firefly
Evaluation
Evaluation
Conclusion
Cycle-accurate simulator (Booksim)
Firefly vs. CMESH, Dragonfly† and OP_XBAR
Synthetic traffic patterns and traces
Yan Pan
ISCA 2009
[† Kim et al, ISCA’08]
21/25




35
35
30
30
25
25
Latency (#Cycles)
Latency (#Cycles)
Load / Latency Curve
20
15
10
4.8x
5
20
15
10
70%
5
Bitcomp, 1-cycle
Uniform, 1-cycle
0
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0
0.2
0.4
0.6
Injection Rate
Injection Rate
(a)
(b)
60
0.8
1
60
Throughput
50
50
Yan Pan
20
ncy (#Cycles)
– 40Up to 4.8x over OP_XBAR40
– 30At least +70% over Dragonfly
30
ncy (#Cycles)
►
Motivation
Architecture of Firefly
Evaluation
Evaluation
Conclusion
ISCA 2009
20
22/25




Taper_L0.7D7
Bitcomp
Energy Breakdown
FIREFLY
OP_XBAR
DFLY_VAL
DFLY_MIN
CMESH
FIREFLY
OP_XBAR
DFLY_VAL
DFLY_MIN
CMESH
Motivation
Architecture of Firefly
Evaluation
Evaluation
Conclusion
Router / DEMUX
Electircal Link
Optical Link
Laser
Ring Heating
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
Average Per-packet Energy (nJ)
►
►
►
Reduced hardware by partitioning
– Reduced heating
Data Path
# Rings
Throughput impact
Locality
1 radix-64 crossbar
1024K
8 radix-8 crossbar
128K
– 34% energy reduction over OP_XBAR with locality
Yan Pan
ISCA 2009
23/25
Technology Sensitivity
bitcomp
►
►




Motivation
Architecture of Firefly
Evaluation
Evaluation
Conclusion
taper_L0.7D7
α is heating ratio and β is laser ratio.
Firefly favors traffic locality.
Yan Pan
ISCA 2009
24/25




Conclusion
►
Motivation
Architecture of Firefly
Evaluation
Conclusion
Conclusion
Technology impacts architecture
– New opportunities in nanophotonics
• Low latency, high bandwidth density
– Tailored architectures needed
►
Firefly benefits from nanophotonics by providing
– Power Efficiency
• Hybrid signaling
• Partitioned R-SWMR crossbars
 Reduced hardware/power
– Scalability
• Scalable inter-cluster bandwidth
• Low-radix routers/crossbars
Yan Pan
ISCA 2009
25/25