Image Neighborhood Processor Compiler

Download Report

Transcript Image Neighborhood Processor Compiler

SLAAC
Systems Level Applications of
Adaptive Computing
DARPA/ITO Adaptive Computing Systems
PI Meeting
Napa, California
April 13-14
Presented by:
Bob Parker
Deputy Director,
Information Sciences Institute
System Level Applications of
Adaptive Computing
Utilizing Three Phases of Adaptive Computing Components
Large Current Generation FPGAs
Rapid Reconfigurable and/or Fine Grain FPGAs
Hybrid FPGAs
Integrating Multiple Constituent Technologies
Scalable Embedded Baseboard
Gigabit/Sec Networking
Modular Adaptive Compute Modules
Smart Network Based Control Software
Algorithm Mapping Tools
Developing Reference Platforms
Flight Worthy Deployable System
Low Cost Researchers Kit
Lab Demo of an ACS implemented SAR ATR algorithm
First Generation of Reference Platforms
Embedded SAR ATR Demo of ACS HW (Clear, 1Mpixel/s, 6TT)
Significant reduction in power, weight, volume,
and cost for several challenging DoD embedded
applications
•SAR ATR
•Sonar Beamforming
•IR ATR
•Others
Embedded SAR ATR Demo (CC&D, 1Mpixel/s, 6TT)
Embedded SAR ATR Demo
(CC&D, 10Mpixel/s, 6TT)
‘97
‘98
‘99
‘00
‘01
Team Members: USC/ISI (Lead), BYU, UCLA, Sandia National Labs
USC INFORMATION SCIENCES INSTITUTE
Bob
7/17/2015
Parker
2
SLAAC Objectives



Define a system-level open, distributed
heterogeneous adaptive systems architecture
Design, develop and evolve scalable reference
platforms implementing the adaptive systems
architecture
Validate the approach by deploying reference
platforms in multiple defense application domains




SAR ATR
Sonar Beamforming
IR ATR
Others
USC INFORMATION SCIENCES INSTITUTE
Bob
7/17/2015
Parker
3
Sonar
Beamforming
SLAAC Affiliates
Ultra WideBand
Coherent
LANL
RF
NUWC
IR
ATR
NVL
Sandia
UCLA
SAR/
ATR
USC INFORMATION SCIENCES INSTITUTE
Sandia
BYU
ACS
Research
Community
ISI
LANL
Bob
7/17/2015
Parker
4
SLAAC Architecture
ACS
Device
DSP
Device
Control
Processor
Control
Processor
Sensor
Host
Network
Interface
Processor
Control
Processor
Network
Host
Control
Processor
Network
Interface
Processor
ACS
Device
Control
Processor
ACS
Device
Myricom L5 Baseboard
XP_XBAR
X0_LEFT
X0
X1
XP_RIGHT
X0_XBAR
PMC BUS
XP_LEFT
X0_RIGHT
XP_LEFT
CLK
XP_XBAR
PROM
X2
XP_RIGHT
PCI BUS
SLAAC1 Board
USC INFORMATION SCIENCES INSTITUTE
Myricom L4 Orca Board
UCLA Board
Bob
7/17/2015
Parker
5
SLAAC Programming Model

Single host program
controls distributed
system of nodes and
channels



system dynamically
allocated at runtime
multiple hosts compete
for nodes
channels stream data
between host/nodes
USC INFORMATION SCIENCES INSTITUTE
Host
Nodes
1
2
Network
3
Bob
7/17/2015
Parker
6
Runtime System
Application
Runtime System
Messaging Layer
Network Layer
System
Layer - High-level programming interface
(e.g., ACS_Create_Sytem (Node_list, Channel_list))
Node
Layer - Hide-device specific information
(e.g., FPGA configuration)
Control
Layer - Node-independent communication commands
(i.e., blocking and non-blocking message passing primitives)
USC INFORMATION SCIENCES INSTITUTE
Bob
7/17/2015
Parker
7
Remote Node Processing
Alternatives
Host Node
Application
Application
Runtime
Runtime
Messaging
Messaging
Network
Remote Node
Network
Messaging
Messaging
Runtime
Runtime
FPGA
Application
Runtime
FPGA
•Less power required
from compute node
USC INFORMATION SCIENCES INSTITUTE
•Less latency between application
and low-level control
Bob
7/17/2015
Parker
8
Runtime and Debugging

Interactive debugger




all system layer C functions
provided in command-line
interface
symbolic VHDL debugging
support using readback
single-step clock
scriptable
USC INFORMATION SCIENCES INSTITUTE

SLAAC Runtime



monitor system state
hardware diagnostics
Other tools



network traffic monitors
(MPI based?)
load balancing
visualization tools
Bob
7/17/2015
Parker
9
Runtime Status

Complete



System Layer API specification
Control Layer API specification, partially simulated
Scheduled


May: VHDL simulation of SLAAC board
June: Implementation of basic runtime system functions
USC INFORMATION SCIENCES INSTITUTE
Bob
7/17/2015
Parker
10
Development Platform Path
SLAAC Runtime System
Low Cost COTS
Development Platform
System Layer
Node Layer
Control Layer
SBC w/ External Network
Myrinet SLAAC1
Myrinet SLAAC1
PMC
PMC
PMC
PMC
System Layer
Card
Board
Card
Board
Node Layer
SBC
SBC
Myrine
t
P0/ Myrinet
SBC w/ Embedded
Network
SLAAC
SLAAC DoubleDouble-Wide PMC
Wide PMC Card
Card
SBC
Control Layer
System Layer
Node Layer
Control Layer
SBC
Improved
Compute
Density
Improved
Development
Environment
SLAAC
Fully Embedded Platform
P0 / Myrinet
SLAAC
System Layer
Node Layer
L5 Baseboard
Control Layer
L5 Baseboard
USC INFORMATION SCIENCES INSTITUTE
Bob
7/17/2015
Parker
11
Hardware Platforms and Software
Development
Host
Node O.S.
Hardware Platform
Node
Application
Application
Runtime
Runtime
COTS OS
COTS OS
No Node O.S.
Host
Node
Application
Application
Runtime
Runtime
COTS OS
Custom NI
Node O.S
No Node O.S.
Cluster of workstations
MPI, Linix or NT, PCI
GM, PCI
SBC w/ external network
MPI, VxWorks, PMC
MPI, PMC
SBC w/ embedded network
MPI, VxWorks, VME P0
GM, VME P0
?
GM, VME P0
Fully Embedded
•Low risk development path
•Custom network interface program (exploits GM)
•Standards compliant (MPI, VxWorks)
•Direct network/compute connection
•Recompile to change platforms
•Immature development environment
•GP programming environment at node level
•SLAAC provides programming environment
•Bandwidth limited by MPI
•Maximum bandwidth
USC INFORMATION SCIENCES INSTITUTE
Bob
7/17/2015
Parker
12
BYU/UCLA Domain-Specific
Compilers for ATR
BYU:
Focus of Attention (FOA)
Image Morphology
“CYTO” code
UCLA:
Template Matching
Templates
Optimization
Here
Correlator
Generator
Neighborhood
Processor Generator
VHDL (Structural)
VHDL (Structural)
Synopsys
Logic Synthesis
Hand optimized
neighborhood
operators
(Viewlogic Library)
Synopsys
Logic Synthesis
Optimization
Here
Xilinx
Logic & Route
Xilinx
Place &Route
FPGA
Map “CYTO’ neighborhood
operations to pre-defined FPGA blocks
 High packing density to enable
single configuration

USC INFORMATION SCIENCES INSTITUTE
FPGA
FPGA
FPGA
FPGA

Optimize VHDL using template overlap
 Creates optimized template subset with
minimum number of reconfigurations
Bob
7/17/2015
Parker
13
The UCLA Testbench
Mojave
Board
• Mojave board can interface to the i960
development system for in-house
testing (as shown), or with the
Myricom LANai board.
Static FPGA
Bus Connector
External
System Processor
USC INFORMATION SCIENCES INSTITUTE
PCI Bus
PCI Slot
Host Processor
Bob
7/17/2015
Parker
14
SLAAC 1 Board
XP_XBAR
X0_LEFT
X0
X1
XP_RIGHT
X0_XBAR
PMC BUS
XP_LEFT
X0_RIGHT
XP_LEFT
CLK
XP_XBAR
PROM
X2
XP_RIGHT
PCI BUS
FIFO Data (64 pins)
FIFO Control (~16 pins)
Clock, Configuration, Inhibit
External Memory Bus
USC INFORMATION SCIENCES INSTITUTE
Jumper block
256Kx18 SRAM
Bob
7/17/2015
Parker
15
Surveillance Challenge
Problem - SAR / ATR
40,000 sqnm / day
@ 1 ft. Resolution
System Parameter
Current
Challenge
Scale Factor
SAR Area Coverage Rate
(sqnm / day @1 ft Res.)
1000
40,000*
40X (FOA,
Indexer,
Ident.)
Number of Target Classes
6
30
5X (Indexer,
Ident.)
Level / Difficulty of CC&D
Low
High
*Corresponds to a data rate of 40 Megapixels / sec
USC INFORMATION SCIENCES INSTITUTE
100X (Indexer)
10X (Ident.)
Bob
7/17/2015
Parker
16
Project Benefit Includes
Improved Compute Density
(Scaled To Challenge Size; Assuming FOA, Indexer, and 1 Identifier)
1000000
DARPA EHPC Program
(Early Two-Level Multicomputers + Algorithms)
Number of 6U VME Chassis
(VME Chassis = 3.5 cft, 80 lbs, 700W, $400,000)
100000
Past STARLOS
Systems
10000
JSTARS ‘97 Demo
CC&D
CC&D
JSTARS ‘96 Demo
1000
CC&D
DARPA ACS Program
Clear
‘98 Demo
100
Clear
Clear
10
5X Over Moore’s Law
ACS Large FPGAs with on-chip SRAM blocks + Algorithms
1.0
‘99 Demo Range
10X Over Moore’s Law
ACS Hybrid Chips + Algorithms
‘01 Demo Range
0.1
0.01
‘91
‘92
‘93
‘94
‘95
‘96
‘97
‘98
‘99
‘00
‘01
Year
USC INFORMATION SCIENCES INSTITUTE
Bob
7/17/2015
Parker
17
ATR Flight Demo System
For: 1 Mpixel/sec with 6 target configurations (targets in-the-clear scenario)
Baseline 1996 System:
1997 Flight Demo System:
Hardware Architecture
— Systolic – 3 algorithm modules
— SIMD – 1 algorithm module
— Early 2-level multiprocessors/DSP –
3 algorithm modules
Hardware Architecture
— 2-level multiprocessor/DSP – 8 algorithm
modules (1 additional algorithm module
implemented over baseline system)
354
7
lbs.
453
ft3
(5 VME
17.5 chassis)
(2 VME
chassis)
10,407,600
124
W- ft3 lbs
1680
W
Weight (lbs.)
Volume (ft3)
Power (W)
Power, Volume, Weight
Product (W- ft3 -lbs.)
26.47
ratio
393,204
2-level multiprocessor/DSP configuration implements algorithms (with additional
algorithm module) with better performance and significantly lower power, size,
and weight versus baseline implementation
USC INFORMATION SCIENCES INSTITUTE
Bob
7/17/2015
Parker
18
Common SAR/ATR - DARPA ACS & EHPC
FY97 Testbed
Laboratory Development Element
Systolic
Systolic
Datacube/Custom CYTO
SIMD
CNAPS
Real-Time Deployable Element
(Common ATR Model Year 1)
Joint STARS
MIMD
Multicomputers
PowerPC HPSC
MIMD
Myrinet
SIMD
Myrinet
SIMD
CPP DAP
Systolic
SBC
HIPPI
MIMD
Intel
Paragon
Workstations
SUN / SGI / PC
MIMD
SGI
USC INFORMATION SCIENCES INSTITUTE
RAID
(Data)
Next Generation
Embeddable HPC Technologies
Bob
7/17/2015
Parker
19
JSTARS ATR Processor

PowerPC Multicomputer

13 Commercial Motorola VMEbus
CPU boards
– 200Mhz 603e PowerPC per board
– 5.2 GFLOPS Peak

Commercial Myrinet High Speed
Communications
– 1.28Gbits/sec full duplex
– Cross point topology

SHARC Multicomputer

4 Sanders HPSC processor boards
– 8 33Mhz Analog Devices SHARC
DSP processors per board
– 3.2 GFLOPS Peak

USC INFORMATION SCIENCES INSTITUTE
Myrinet High Speed
Communications
Bob
7/17/2015
Parker
20
DARPA SAR ATR EHPC Testbed
Experiments in Action

TMD/RTM Real-Time ATR
delivered 6/97

FOA, SLD, MPM, MSE, CRM, LPM,&
PGA
 Supported 5 real-time ESAR/ATR
airborne flight exercises
– 2 Engineering check-out flights
– 3 Phase III evaluation flights

Features

RTM ATR Advanced Technology Demonstration


This work performed under the sponsorship of the
Air Force Aeronautical Systems Center and the
Air Force Research Laboratory (formerly Rome
Laboratory)
USC INFORMATION SCIENCES INSTITUTE


1 Mpixel/sec, 6 Configurations targets
in-the-clear scenario
Large scale dynamic range capability
Modular, Scalable, Plug & Play
Architecture
2 VMEbus Chassis ATR System
Heterogeneous Two-Level
Multicomputer, COTS PowerPC and
Sanders SHARC
Bob
7/17/2015
Parker
21
Joint STARS SAR/ATR Transition
Description
•
•
•
•
Jointly managed, USAF ASC/FBXT and AFRL/IF.
Provided JSTARS with a real-time ATR capability.
Leveraged prior Service & DARPA investments.
Sandia developed the ATR System, Northrop
Grumman developed the ESAR system and led
the integration of both systems onto the aircraft.
• ATR system enables an image analysts to identify
threats in real-time by prescreening large amounts
of data for potential targets.
USC INFORMATION SCIENCES INSTITUTE
Accomplishments
•
•
•
•
•
Developed Airborne Real-time SAR / ATR System
Demonstrated initial system at Pentagon(Sep 96)
All COTS system implementation (Apr 97)
Full system integrated on T3 aircraft (Aug 97)
Engineering / integration flights completed with
fully operational system (Sep 97)
• Three real-time demonstration flights (Oct 97)
• Operationally significant Pd/FAR performance
Bob
7/17/2015
Parker
22
BYU- FOA and Indexing
Detection
Focus of
Attention
SAR Image
Indexer
Coarse
Data
Sensor
Preprocessor
Angle
Estimate
Fine Data
Location
Target ID
Identification
Confidence

Superquant FOA
1 pass adaptive threshold technique
 Produces ROI blocks for indexing
 >7.8 Gbops/second/image

1
Mpixel/second, FY98
 40 Mpixel/second, FY01

CC&D indexing

USC INFORMATION SCIENCES INSTITUTE
Algorithm definition in process
Bob
7/17/2015
Parker
23
BYU - SAR ATR Status

Non-Adaptive FOA



Wildforce PCI platform
3 months to retarget to SLAAC board
Compilation Strategies

Current approach
– “VHDL synthesis from scratch”
– Traditional tool flow

Planned approach - July 1999
– “Gate Array” approach
– Fixed chip floorplan regular arrays
30x speedup, compile < 1 hour
~ 10% efficiency loss
USC INFORMATION SCIENCES INSTITUTE
Bob
7/17/2015
Parker
24
Sonar Beamforming
with 96 Element TB-23


Goals:
 First RT matched field algorithm deployment
 1000x  51 Beams
 51000 Beams
 Ranging + “look forward” capability
 Demonstrate adaptation among algorithms at sea
 Validate FPGAs for signal processing
Computation:

2 stage (course and fine)
 16 Gop/sec, 2.5 GB memory, 80 MB/sec I/O

Approach:
 Use k-w + matched field algorithms
 Leverage ACS to provide course grain RTR
– Environmental adaptability
– Multiple resolution processing
USC INFORMATION SCIENCES INSTITUTE
Bob
7/17/2015
Parker
25
STEALTH ADVANCEMENTS
BROADBAND QUIETING COMPARISON
VICTOR 1
LEAD SHIP
KEEL LAID
DEC 93
ALFA
NOISE LEVELS
VICTOR III
594
IMPROVED
VICTOR III
637
AKULA
IMPROVED
AKULA
688
SEVERODVINSK
688I
SSN-21
1960
1970
1980
1990
NSSN
2000
2010
Credit: NUWC
USC INFORMATION SCIENCES INSTITUTE
Bob
7/17/2015
Parker
26
Sonar Status

Algorithm identified by NUWC 4/3/98


validation in process
BYU mapping of NUWC algorithm

underway for Wildforce board
 map to SLAAC boards when available

Sonar module generation

Operational generators include:
– pipelined multipliers & CORDIC units
– C/C++ programs generating VHDL code
– generators used in Wildforce mapping
USC INFORMATION SCIENCES INSTITUTE
Bob
7/17/2015
Parker
27
Timeline - Baseline + Option
FY98
Feasibility Study
NUWC specifies
algorithm
FY99
1st Mapping
FY00
FY01
1 Advanced algor. dev.
1 Advanced algor. dev.
on RRP
2 Sonar subcompilers
2 Sonar module generators
Top level compiler
Advanced SLAAC
ACS boards avail.
1st SLAAC ACS
board avail.
1 BYU prelim. mapping
to ACS
2 Submarine I/F specified
and I/F construction begun
USC INFORMATION SCIENCES INSTITUTE
BYU end-to-end lab
demo complete
and delivered to ISI
ISI delivers demo
system to NUWC
for testing
SEA TEST (summer 2000)
Bob
7/17/2015
Parker
28
SLAAC Conclusions





Great early success in deployed capability
Interesting runtime tradeoffs
Significant risk reduction through COTS standards
Promising simulation results - headed for
hardware
Adaptive systems are hard - but getting easier
http://www.east.isi.edu/SLAAC
USC INFORMATION SCIENCES INSTITUTE
Bob
7/17/2015
Parker
29