Slides - LPGPU.org

Download Report

Transcript Slides - LPGPU.org

An Close Look at the Epiphany-IV
28nm 64-core Coprocessor
Andreas Olofsson
PEGPUM 2013
1
Adapteva Achieves 3 “World Firsts”
1. First processor company to reach 50 GFLOPS/W
2. First FOSS
OpenCL
SDK in the
mobile market
3. First semiconductor company to
successfully crowd-source project
Copyright © Adapteva. All rights reserved.
2
The Future is…
Efficient…
Heterogeneous…
Open..
Task Parallel…
Grande Challenges Ahead…
Rebuild the computer ecosystem!
Rewrite billions of lines of code!
Retrain millions of programmers!
Rewrite the education curriculum!
Copyright © Adapteva. All rights reserved.
3
Our Vision: True Heterogeneous Computing
SYSTEM-ON-CHIP
BIG
CPU
FPGA
BIG
CPU
GPU
Copyright © Adapteva. All rights reserved.
BIG
CPU
Analog
BIG
CPU
100’s of small
Epiphany RISC
CPUs/DSPs
4
Architecture Comparison
Technology
FPGA
DSP
GPU
CPU
Manycore
Process
28nm
40nm
28nm
32nm
28nm
Programming
VHDL
OCL/C++/C
CUDA/OCL
OCL/C/C++
OCL/C/C++
Area (mm^2)
590
108
294
216
10
Chip Power (W)
40
22
135
130
2
CPUs
n/a
8
16
4
64
Max GFLOPS
1500
160
3000
115
102
GHz * Cores
n/a
12
16
14.4
51.2
Compile Time
Hours
Minutes
Minutes
Minutes
Minutes
L1 Memory
6MB
512KB
2.5MB
256KB
2MB
Efficiency is
everything
Copyright © Adapteva. All rights reserved.
Peak performance
means very little
No magic bullet!
5
Epiphany: Massive Task-Parallelism
1 GHz RISC
Core
`
Local
Memory
Multicore
Framework
Router
Coprocessor to
ARM/Intel CPU
Copyright © Adapteva. All rights reserved.
25mW per core
C/C++ programmable
6
Programming Models
MODEL#1
TASK QUEUE MODEL
• Great for up to 2GFLOPS
• Supports standard C/C++
• “Cloud on a chip”
Task1
Task2
X86/ARM/FPGA Host
Task3
Task4
MODEL #2
DATA PARALLEL MODEL
• openCL programmable
• Easy integration of C/C++
• openMP/MPI roadmap
Task1
X86/ARM/FPGA Host
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
MINI
CPU
Copyright © Adapteva. All rights reserved.
7
CPU Architecture Tradeoffs
Feature
In
Out
Why
Single Precision Floating Point
X
Programming Efficiency
64 Entry Register File
X
GCC
Byte Addressable
X
GCC
64bit load/store
X
Data movement is king
Dual Issue In Order Scheduling
X
4-Bank Local Memory
X
Multicore data movement
Interrupts, breakpoints, timers
X
Inexpensive
Hardware Cache
X
Too expensive, software managed
Double Precision Floating Point
X
Overkill for initial market
VLIW
X
GCC, Complexity
Instr: Not, Mask,Rotate, Add-Carry, Ones
X
Not important
Data Type ISA Orthogonally. (Signed
,unsigned) & (8b,16b,32b)
X
Expensive. Focus was on floating
point
Copyright © Adapteva. All rights reserved.
8
Network-On-Chip Tradeoffs
Feature
In
Out
Why
64-Bit Transfer /Cycle
X
Maximize efficiency
Send address on every cycle
X
“Wires are free”, simplicity
Round robin arbitration
X
Easy, good enough
Distributed routing
X
Scalable
Robust flow control
X
Must have
Bidirectional Mesh
X
Well known
Address mapped packet switching
X
Ease of Use
Single cycle message transfer
X
Short messaging important
3 separate Networks
X
No deadlock, QOS
Extensive QOS features
X
Deterministic traffic
Torus wraparound connections
X
Too expensive in CMOS
Circuit switching
X
Not general purpose
Large routing buffers
X
Too expensive
Multilayer Mesh
X
Too expensive (for now)
Copyright © Adapteva. All rights reserved.
9
Epiphany Implementation Methodology
Epiphany-IV
•
Scalable
•
24 hrs from RTL to GDS
0,0 0,1
Reusable tiles
1,0 1,1
•
•
No long wires
•
Pattern density
•
Fault tolerant
•
Thermal balancing
•
Easy clocking
Copyright © Adapteva. All rights reserved.
IO PADS
L
O
G
I
C
S
R
A
M
G
L
U
E
7,7
10
Epiphany-IV Area Breakdown
IO
Memory
FPU
Register File
Everything Else
Memory/RF/FPU
>65% of silicon die
Copyright © Adapteva. All rights reserved.
Make every
transistor count
MIMD efficiency
“good enough”
11
Epiphany-IV Specifications
Copyright © Adapteva. All rights reserved.
eLink IO
1 GHz
High Performance
RISC CPU
`
32KB+
Distributed
Local Memory
eLink IO
64 CPUs
IEEE Floating Point (SP)
800 MHz Max Frequency
100 GFLOPS Performance
6.4 GB/s IO BW
200 GB/s peak NOC BW
1.6 TB/sec on chip memory BW
25 Billion Messages/sec
2MB on chip memory
10 mm2 total silicon area in 28nm
2 Watt total chip power
324 ball 15x15mm BGA
Sampling since July, 2012
eLink IO
•
•
•
•
•
•
•
•
•
•
•
•
•
Multicore
Communication
Framework
Router
eLink IO
12
Epiphany System Examples
SDIO
FLASH
USB
SDRAM
ETH
ARM
SOC
Etc..
Programmable and flexible
•
SMALL
FPGA
•
Easy to develop for
•
Efficient and powerful
•
Scalable
FPGA
JESD204
DMA
eLink
eLink
ADC
JESD204
DMA
eLink
eLink
E16G301
E16G301
4GB/s
JESD204
DMA
JESD204
DMA
E16G301
E16G301
E16G301
eLink
eLink
eLink
eLink
Copyright © Adapteva. All rights reserved.
eLink
eLink
DMA
JESD204
DAC
eLink
eLink
DMA
JESD204
DAC
eLink
eLink
DMA
JESD204
DAC
eLink
eLink
DMA
JESD204
DAC
4GB/s
E16G301
ADC
E16G301
FPGA
ADC
ADC
2GB/s
eLink
E16G301
13
Parallella Computing Project
ZYNQ7010
• Open (and ”free”):
SD CARD
• Documentation
Gb Ethernet
• Board design files
USB OTG
• Drivers
USB 2.0
• Software Tools
1GB LPDDR2
UART
• Accessible (NO NDAs!)
I2C
• $100 entry point
• ~4000 devs signed up in 4 weeks
Dual-Core
ARM A9
Processor
2.4GB/s
Programmable
Logic
1.4GB/s
GPIO
IO
Rj45
uSD
ZYNQ
(ARM)
CPU
HDMI
1GB SDRAM
GPIO
Copyright © Adapteva. All rights reserved.
IO
E64
IO
USB
USB
IO
Epiphany (E16|E64)
14
Parallella Architecture
SD-CARD
I2C
UART
Ethernet
USB OTG
USB 2.0
Dual Core
ARM A9
(800MHz)
MIO
Zynq
“Hard”
Zynq
FPGA
Off-Chip
AXI BUS
AXI-MASTER
AXI-SLAVE
“Glue-Logic”
eLink
GPIO
Epiphany
Daughter
Card
Copyright © Adapteva. All rights reserved.
AXI-MASTER
HDMI
Controller
MEM-CTRL
“O/S” DRAM
“Sandbox”
SHARED DRAM
15
15
Experimental Features On The Way
• Network Traffic and Congestion Monitors
• Multicore Hardware Synchronization
• Active Message
• Multicore Breakpoint
• Hardware Loops
• Multicast Network Transactions
Copyright © Adapteva. All rights reserved.
Already inside
Epiphany-III/IV
silicon, but need
more testing!
16
MIMD Manycore IS the Future!
~10mm
1024 CPUs
32KB/core
0.2W-20W
2012
Copyright © Adapteva. All rights reserved.
~10mm
16K CPUs
1MB/core(3D)
0.2W-20W
2022
17