Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented by Damon Van Buren SEAKR Engineering MAPLD.

Download Report

Transcript Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented by Damon Van Buren SEAKR Engineering MAPLD.

Aerospace Data Storage and Processing Systems
Implementation of High-Rate
JPEG2000 Coding on a Virtex-2 Pro
Reconfigurable Computing Board
Presented by Damon Van Buren
SEAKR Engineering
MAPLD 2004
Submission 133
1
The Sensor Bandwidth Problem
Aerospace Data Storage and Processing Systems
 Commercial satellite imaging systems are experiencing growth in imaging
capability...
• Higher resolution: < 1 m
• Larger images: >10k image width and height
• More spectral components
– Panchromatic
– Red/Green/Blue
– Multi-spectral
 Improved capabilities are leading to high sensor data rates
• Data output rates > 2 Gbps for some systems
 Providing storage and downlink bandwidth for the data is becoming a
significant challenge for system designers
• The largest data recorders can store less than 20 minutes of data at 2 Gbps
• Downlinks must be several hundred Mbps to downlink 15 minutes of data in under
an hour
• Data storage and high-bandwidth downlinks require lots of power
 By reducing the amount of image data, compression provides a solution to
the bandwidth problem!
Van Buren
2
Submission 133
Desired Compressor Features
Aerospace Data Storage and Processing Systems
 Real Time
• Compression must be performed in real time, prior to storage.
• High throughput (> 2 Gbps)
 Excellent Performance in Lossy and Lossless Modes
• Purchasers of satellite imagery are sensitive to reductions in image quality
caused by lossy compression.
• Scientific users prefer undistorted data (bit true).
 Space-Qualified
• Must survive hazards of launch and space operation, including radiation.
 Low Risk
• Satellite imaging companies seek high reliability solutions..
 Low Cost
• Commercial customers require cost effective solutions.
 Flexible
• The ability to support varying compression ratios and contents would allow
more effective use of available storage and bandwidth.
Van Buren
3
Submission 133
JPEG2000 Algorithm
Aerospace Data Storage and Processing Systems
 JPEG2000 is an excellent choice for satellite image compression.
• Latest still image compression standard from the JPEG committee
 Meets two key requirements for satellite image compression:
• Excellent performance in both lossy and lossless modes.
– ~1.7 to 1 lossless compression for typical satellite imagery - 70% improvement!
– Visually lossless compression > 2 to 1 - 100% improvement in storage and downlink
performance.
• Very flexible:
– Many options for compressed images.
 Other advantages:
• International Standard
• Wavelet based
– High quality lossy images with comp. ratios > 100:1
• Packet oriented
– Allows random access to the compressed code stream.
– Makes compressed data more robust in the presence of bit errors.
– Allows selection of image quality, spatial region, resolution, and color component after
compression.
Van Buren
4
Submission 133
JPEG2000 Implementation
Challenges
Aerospace Data Storage and Processing Systems
 JPEG2000 is a very complex algorithm.
• More Features = More Complexity.
 Operation intensive
• Several hundred operations per pixel, because each bit must be processed many
times, for the wavelet transform, entropy coding, MQ coding, packet generation,
etc.
 Complex
• Many different stages to produce compressed output.
–
–
–
–
–
Wavelet transform.
Quantization.
Context generation.
Arithmetic coding.
Packet generation.
• Many parameters must be tracked individually for each code block (64x64).
 Memory intensive
• Each pixel must be accessed many times, so many small buffers are needed to get
good throughput.
 Few processors are capable of implementing JPEG2000 at high rates!
Van Buren
5
Submission 133
High-Performance Processing Using
Xilinx FPGAs
Aerospace Data Storage and Processing Systems
 Xilinx FPGAs have many advantages for fast parallel processing:
• Millions of gates.
• System clocks of several hundred MHz.
• High speed I/O
– 622 Mbps LVDS
– Multi-Gigabit serial I/O
• Hundreds of internal block RAMS.
• Hundreds of internal 18 bit multipliers.
 Xilinx FPGAs are available in a space qualified versions:
• Radiation testing is complete on the Virtex and Virtex-II devices.
– ~200 kRad total dose, latchup immune.
• Radiation testing to begin on the Virtex-II Pro devices soon.
 Xilinx FPGAs are very flexible, reducing risk:
• May be re-programmed an infinite number of times.
• Configurations may be uploaded at any time during the mission to fix errors or add
new capability.
 Xilinx FPGAs are the best solution for fast compression in space!
Van Buren
6
Submission 133
Challenges for Xilinx Use in Space
Aerospace Data Storage and Processing Systems
 The effects of radiation in spacecraft electronics are well known.
• Caused primarily by charged particles.
• May cause permanent damage over time by ionizing SiO2 (total dose).
• May also cause errors in digital logic by upsetting registers (single event
effects).
• Mitigation techniques are used to reduce or eliminate the effect of radiation
upsets.
– Triple Modular Redundancy (TMR) uses voting to select the correct output from 3 separate instances of
the design.
 Mitigation of radiation effects in SRAM-based FPGAs presents an
additional challenge:
• As with other digital electronics, the functional logic of the device is susceptible
to upset, however...
• Another layer of logic (configuration logic) controls the routing of the part,
giving the device its capability to be reprogrammed to perform different
functions.
• Configuration logic is also susceptible to radiation upsets.
 Xilinx FPGAs require system level mitigation strategies in addition to the
device level mitigation techniques (such as TMR) that are commonly used
for space electronics.
• Configuration data must be continuously re-written, or scrubbed using a readand-correct approach.
Van Buren
7
Submission 133
SEAKR’s RCC Board Processing
Solutions
Aerospace Data Storage and Processing Systems
 SEAKR has developed a line of Reconfigurable Computing (RCC)
products based on the Xilinx FPGAs.
• RCC 1 – 4x Virtex 1000s
• RCC 2 – 4x Virtex II 6000s
• RCC 3 (NTRCC) – 4x Virtex II Pro 70/100s
 Boards include system-level upset mitigation (scrub) for the Xilinx
devices.
• Configuration data is continuously read and checked for errors.
• Errors are corrected by overwriting the corrupted frames, without interrupting
the operation of the device.
 Other devices on board employ radiation mitigation strategies as well:
• Radiation hardened
• EDAC
 Boards also have dedicated resources to support high-performance
processing:
• High speed I/O.
• External memories.
 Industry standard form-factor: 6U Compact PCI.
Van Buren
8
Submission 133
Network RCC (NTRCC)
Aerospace Data Storage and Processing Systems
 Four Xilinx XC2VP70-6FF1704 FPGA CO-Processors
• Design compatible with XC2VP100-6FF1706 and V2P-X
 (4) banks of 1Mx36 Quad Data Rate (QDR) SRAMs for each COP
 512MB of DDRII Shared SDRAM memory for prototype
• 1GB of 128M x 64 EDAC (R-S) Protected DDRII SDRAM shared memory (19.2Gbps @150MHz)
using 1Gbit memory
 Network IF
•
•
•
•
(2) parallel 16bit RapidIO ports to front panel (8 Gbps)
(1) 4x3.125 Gbps serial port to front panel (>10Gbps)
4x3.125 Gbps ports from NIC to each COP (>10Gbps)
4x3.125 Gbps ports from each COP to each neighbor COP (>10Gbps)
 Shared Data Buses
• Cop Interconnect Bus (~4.224 Gbps)
• cPCI 32bit 33Mhz
 Read and write COP configurations via cPCI
 Extended 6U form factor
 Configuration RAM SEU detection and correction
• DDRII SDRAM on configuration controller for shadow config program storage
 Non-Volatile memory for 16 different configurations (1 Gbit Flash)
Van Buren
9
Submission 133
Network RCC Block Diagram
Aerospace Data Storage and Processing Systems
Van Buren
10
Submission 133
NTRCC Layout
Aerospace Data Storage and Processing Systems
24 Layer board
MicroVias, blind vias, via-in-pad
High speed 3.125 Gbps Serial links
82 pages of schematic capture
10 weeks of PCB layout time
Van Buren
11
Submission 133
Implementation of the JPEG2000
Algorithm
Aerospace Data Storage and Processing Systems
 The JPEG2000 core has been in development for over a year.
•
•
•
•
Eventual target data rate 600 Mbps/device.
Written in VHDL.
Simulations performed in Modelsim.
Synthesis in Synplify_Pro.
 Targeted to the NTRCC-R summer ‘04.
• Targeted to a reduced version of the NTRCC with a single coprocessor.
• Take advantage of improved external memory throughput.
• Ultimately use the high-speed serial I/O to move image information on the
board.
 Designed for high throughput.
•
•
•
•
Cycle efficient coding style.
Highly parallel design.
Pipelined architecture.
Rolling wavelet transform.
 Designed for flexible output file format.
• Output is divided into quality layers for easy selection of compression ratio.
Van Buren
12
Submission 133
JPEG2000 Block Diagram
Aerospace Data Storage and Processing Systems
Van Buren
13
Submission 133
JPEG2000 Coding Steps
Aerospace Data Storage and Processing Systems
 Image is broken into tiles
 Tiles are wavelet transformed
• 5/3 reversible or 9/7 irreversible, also user defined.
• Selectable number of transform levels.
 Each subband from the transform is further broken up into code blocks
(typically 32x32 or 64x64) for entropy coding.
 Each code block is entropy coded, starting from the top bit plane and
working down.
• The current bit of each pixel is passed to an arithmetic coder, along with context
information.
• The MQ encoder takes advantage of any skewing of the probability for each
context, and adapts contexts as the coding progresses.
 Packets are formed by combining the entropy coder outputs from a single
resolution.
 Tile parts are formed from all the packet in a given bit plane.
Van Buren
14
Submission 133
JPEG2000 Architecture Drivers
Aerospace Data Storage and Processing Systems
 To achieve high data rates, the processing must be paralleled as much
as possible.
 The “tall pole in the tent” is the arithmetic coding, because the coding of
a single data bit with its context can take several clock cycles.
 Significance propagation coding is also a challenge, because each
coefficient must be accessed many times, as each bit plane is
processed.
 Other operations, such as wavelet transform, code block loading, and
packet generation are much more efficient, and require fewer parallel
paths.
 A pipelined architecture with many entropy coders in parallel was used
to achieve the required throughput.
Van Buren
15
Submission 133
Architecture Description
Aerospace Data Storage and Processing Systems
 Processes 256x256 tiles.
 Pipelined architecture, using separate external memories for image,
tile, and compressed data storage.
 19 Entropy coders working in parallel to improve throughput, one for
each code block.
• 64x64 code blocks.
 FIFO buffering between the stages improves data flow efficiency.
 A rolling wavelet transform is used to reduce memory accesses and
improve efficiency.
 Entropy coder outputs are formed into layers, giving each tile a
progressive output format.
 Tile parts are interleaved as the image tiles are processed.
 Performs lossy or lossless compression.
Van Buren
16
Submission 133
NTRCC-R Implementation Results
Aerospace Data Storage and Processing Systems
 The JPEG2000 encoder was targeted to the V2Pro 70 FPGA on the NTRCC-R.
• Lossless or Lossy compression.
• Data precision up to 13 bits.
 Simulation and Routing Results:
• Slices: 30043 out of 33088, 90%
• Block RAMS: 148 out of 328, 45%
• Max system clock ~43 MHz without optimization.
 Hardware Throughput:
• ~140 Mbps w/ 33 MHz clock (depending on image.)
• ~180 Mbps w/ 43 Mhz clock.
Van Buren
17
Submission 133
JPEG2000 Floorplan
Aerospace Data Storage and Processing Systems
 The Pro 70 Device is quite full!
Van Buren
18
Submission 133
Planned Improvements
Aerospace Data Storage and Processing Systems
 Optimize design to hit 66 MHz.
• Un-optimized design will operate at up to 43 MHz.
• Use of asynchronous fifos will allow optimal clocking of various parts of
the design.
 Improve pipelining of code block loader and wavelet transform.
• Allow “autonomous” operation of each stage, so that operations take
place as soon as input data and output buffers are ready.
 Make use of additional QDR SRAMs available to each coprocessor
by creating separate buffers for wavelet transform and packetizer
output.
• NTRCC has 4 QDR memories for each coprocessor.
 Arithmetic coder bypass.
• Arithmetic coder requires > 2 cycles per bit coded, on average.
 9/7 wavelet transform with quantization.
• Use of the 9/7 wavelet results in better SNR and max error performance
for lossy compression.
 Add RapidIO serial interface to Network Interface Chip (NIC).
Van Buren
19
Submission 133
Conclusions
Aerospace Data Storage and Processing Systems
 The JPEG2000 core is expected to provide a valuable option for
satellite imagery systems.
• Compression will result in a dramatic improvement in system
performance.
• Lossless compression will allow ~70% more image data to be stored and
downlinked by a system.
• Lossy compression will allow even greater improvements.
 NTRCC hardware is an excellent platform for the compressor.
• High bandwidth interconnect and I/O (several Gbps).
• High bandwidth external memories.
• Excellent processing capability with the Virtex-II Pro devices.
 The sky’s the limit!
• Target rate of 600 Mbps per device appears to be a realistic goal.
• Some improvements are left to be made to the clock rate and pipelining
of the design.
Van Buren
20
Submission 133