ONR GET Repository - MIT Lincoln Laboratory

Download Report

Transcript ONR GET Repository - MIT Lincoln Laboratory

Enabling High Performance Embedded
Computing through Memory Access via
Photonic Interconnects
Gilbert Hendry
Eric Robinson
Vitaliy Gleyzer
Johnnie Chan
Luca P. Carloni
Nadya Bliss
Keren Bergman
MIT Lincoln Laboratory
HPEC 2010 - 1
Hendry, et al. 7/21/2015
* This work is sponsored by the Defense Advanced Research Projects Agency (DARPA) under Air Force contract FA8721-05-C-0002. Opinions,
interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government.
Photonics:
Advantages and Disadvantages
Advantages
Very fast transfer rate
Very low latency for
long distances
Low power
Disadvantages
High upfront cost in
time to send a packet
High upfront cost in
power to send a
packet
HPEC 2010 - 2
Hendry, et al. 7/21/2015
Photonic Interconnects hold
potential for on-chip computing.
However, the target applications
must be considered to determine if
photonics will be beneficial for them
MIT Lincoln Laboratory
Embedded Computing:
ISR Applications
Image
Registration
SAR Image
Formation
Where is the image in
relation to other images
already taken?
How many pulses can
feasibly be combined and
what size of an image
can we take?
Image
Sharpening
Can image fidelity be
improved through using
additional information or
multiple pictures?
HPEC 2010 - 3
Hendry, et al. 7/21/2015
MIT Lincoln Laboratory
Image Registration
Image Registration Involves:
• Image Orientation and Scaling
• Image Alignment
Produces an image that “fits”
properly with other registered
images to get a global view of the
area.
HPEC 2010 - 4
Hendry, et al. 7/21/2015
MIT Lincoln Laboratory
Image Sharpening
Image Fusion:
Filtering:
Fuses two low resolution
images to form a high
resolution result.
Enhances image fidelity by
combining filters with the
original image (Bicubic,
Bilinear, Halfband...)
HPEC 2010 - 5
Hendry, et al. 7/21/2015
MIT Lincoln Laboratory
SAR Image Formation
Synthetic Aperture Radar (SAR) is
an imaging technique that uses
RADAR pulses rather than
photography
SAR Processing:
• Image formation nontrivial,
requires combining pulses
• The more pulses that can
be processed, the higher the
image resolution
• SAR can operate in
conditions where traditional
photography fails (low light,
cloud cover)
HPEC 2010 - 6
Hendry, et al. 7/21/2015
MIT Lincoln Laboratory
ISR Application Kernels
ISR Kernels:
Matrix
Multiply
Fourier
Transform
• Matrix Multiply,
Projective Transform,
Fourier Transform
• Used in a broad range
of ISR applications
• Typically a
performance bottleneck
• Demand high
throughput from the
memory and network
modules
Projective
Transform
HPEC 2010 - 7
Hendry, et al. 7/21/2015
MIT Lincoln Laboratory
Characteristics of ISR Applications
ISR Applications Ideal Candidates
for Photonic Interconnects
• Large Memory Access Size
• Low Power Requirements
• High Memory Access to Compute Ratio
• High Throughput Requirements
HPEC 2010 - 8
Hendry, et al. 7/21/2015
MIT Lincoln Laboratory
Ring Resonators
• Modulator/filter

Broadband
λ
HPEC 2010 - 9
Hendry, et al. 7/21/2015
λ
MIT Lincoln Laboratory
Circuit-switched P-NoCs
Electronic
p- Control n-region
region
1V
Ohmic Heater
Thermal
Control
0V
0V
Transmission
1V
Off-resonance
profile
On-resonance
profile
HPEC 2010 - 10
Hendry, et al. 7/21/2015

Injected
Wavelengths
MIT Lincoln Laboratory
Peripheral Memory Access
Processor Core
Network Router
Memory Access Point
HPEC 2010 - 11
Hendry, et al. 7/21/2015
MIT Lincoln Laboratory
Memory Access Point
On Chip
Chip
Boundary
From Memory Module
Data
plane
To/From
Network-on-Chip
To Memory Module
Modulators
Off Chip
Control
plane
Memory
Control
[V. R. Almeida et al. Cornell]
HPEC 2010 - 12
Hendry, et al. 7/21/2015
MIT Lincoln Laboratory
Photonic TDM Network
• Mesh topology
• Distributed switch control
• Single dimension transmission
• Controlled by fixed time slots :
HPEC 2010 - 13
Hendry, et al. 7/21/2015
MIT Lincoln Laboratory
Vertical Memory Access
Vertical Coupler
[J. Schrauwen et al. U of Ghent.]
HPEC 2010 - 14
Hendry, et al. 7/21/2015
MIT Lincoln Laboratory
SDRAM DIMM Anatomy
DRAM_Bank
DRAM_Chip
data
IO
Cntrl
Banks
(usually 8)
Row
addr/en
Col
Decoder
Row
Decoder
Col
addr/en
data
Sense
Amps
DRAM cell arrays
Addr/c
ntrl
Ranks
SDRAM device
DRAM_DIMM
HPEC 2010 - 15
Hendry, et al. 7/21/2015
MIT Lincoln Laboratory
Optical Circuit Memory (OCM)
Anatomy
Mux Chip
Laser In
drivers
To Mux
Chip
DRAM Chip
Addr/
cntrl
IO
Cntrl
Bank
data
From
Mux
Chip
IO
Gating
Rx Dec.
AWG
AWG
AWG
AWG
AWG
AWG
AWG
AWG
AWG
AWG
Addr/cntrl
Laser Source
Waveguide
HPEC 2010 - 16
Hendry, et al. 7/21/2015
Waveguide Coupling
VDD, Gnd
MIT Lincoln Laboratory
Results: Circuit Switched
Application Performance
Performance
47.3
Performance (GOPS)
50
40
31.82
27.8
30
26.51
17.76
20
13.48
10
1.04
0.78
4.74
1.75
4.32
3.12
0
Emesh
EmeshCS
PS-1
PS-2
Network Type
Projective Transform
Matrix Multiply
FFT
EmeshCS yields the best performance, but PS-1 and PS-2 are competitive
HPEC 2010 - 17
Hendry, et al. 7/21/2015
MIT Lincoln Laboratory
Results: Circuit Switched
Power
Network Power
19
20
15.8
Power (W)
15
11.2
11.1
11.4
11.2
10
4.37
5
4.35
4.28
2.21
2.17
2.15
0
Emesh
EmeshCS
PS-1
PS-2
Network Type
Projective Transform
Matrix Multiply
FFT
PS-1 and PS-2 use much less power than electronic alternatives
HPEC 2010 - 18
Hendry, et al. 7/21/2015
MIT Lincoln Laboratory
Results: Circuit Switched
Performance/Watt Comparison
Performance per Watt Improvement
Improvement Factor
100
86.7 89.33
87.64
80
68.6
60
40
26.9 29.01
20
1
1
1
9.67
6.72
2.82
0
Emesh
EmeshCS
PS-1
PS-2
Network Type
Projective Transform
Matrix Multiply
FFT
PS-1 and PS-2 give the best performance per unit of power
HPEC 2010 - 19
Hendry, et al. 7/21/2015
MIT Lincoln Laboratory
Results: Circuit Switched
Power Budget Breakdown
Projective Transform
Electronic
components
dominate the power
of all the systems in
question
PS-1 and PS-2 both
dominated by Electronic
Crossbar
The Electronic Crossbar requires a
significant amount of power. However, in
the Electronic Mesh, the Electronic
Buffers dominate the energy
consumption
HPEC 2010 - 20
Hendry, et al. 7/21/2015
Emesh dominated by
Electronic Buffer
EmeshCS dominated by
Crossbar and Electronic
Wire
MIT Lincoln Laboratory
Results: TDM
Projective Transform
30
60
23.97
20
15
15.49
16.02
11.22
40
30
10
20
5
10
0
0
Emesh
Pmesh
P-TDM P-ETDM
Network Type
25
51.04
50
GOPS
Power(W)
25
Performance per Watt
Improvement
Performance
20.87
7.55
1.11
Improvement Factor
Network Power
22x
20
13x
15
10
5x
5
0
Emesh
Pmesh
P-TDM P-ETDM
Network Type
Emesh
Pmesh
P-TDM
P-ETDM
Network Type
TDM Results:
• Performed on a smaller image (256x256)
• Yields the best performance when
packets can be sent in a single time slice
• Constant setup cost means smaller
packages can be sent with less overhead
TDM yields advantages when message sizes are smaller
HPEC 2010 - 21
Hendry, et al. 7/21/2015
MIT Lincoln Laboratory
Conclusions
•
ISR front-end application performance is
of increasing importance in the
community
•
These applications put large demands on
the memory and network subsystems
•
Photonics offers a low-powered approach
to meeting these performance demands
For the full details on these photonic architectures, see our other
publications in the Journal of Parallel and Distributed Computing
(JPDC) 2011 and Supercomputing (SC) 2010
HPEC 2010 - 22
Hendry, et al. 7/21/2015
MIT Lincoln Laboratory
References
• TDM Arbitration in a Silicon Nanophotonic Network-On-Chip for High Performance CMPs
Gilbert Hendry, Eric Robinson, Vitaliy Gleyzer, Johnnie Chan, Luca P. Carloni, Nadya Bliss, Keren Bergman
Journal of Parallel and Distributed Computing 2011
• Circuit-Switched Memory Access in Photonic Networks-on-Chip for High Performance Embedded Computing
Gilbert Hendry, Eric Robinson, Vitaliy Gleyzer, Johnnie Chan, Luca P. Carloni, Nadya Bliss, Keren Bergman
Supercomputing 2010
HPEC 2010 - 23
Hendry, et al. 7/21/2015
MIT Lincoln Laboratory