Massively LDPC Decoding on Multicore Architectures

Download Report

Transcript Massively LDPC Decoding on Multicore Architectures

Massively LDPC Decoding on
Multicore Architectures
Present by : fakewen
Authors
• Gabriel Falcao
• Leonel Sousa
• Vitor Silva
Outline
• Introduction
• BELIEF PROPAGATION
• DATA STRUCTURES AND PARALLEL
COMPUTING MODELS
• PARALLELIZING THE KERNELS EXECUTION
• EXPERIMENTAL RESULTS
Outline
• Introduction
• BELIEF PROPAGATION
• DATA STRUCTURES AND PARALLEL
COMPUTING MODELS
• PARALLELIZING THE KERNELS EXECUTION
• EXPERIMENTAL RESULTS
Introduction
• LDPC decoding on multicore architectures
• LDPC decoders were developed on recent
multicores, such as off-the-shelf generalpurpose x86 processors, Graphics Processing
Units (GPUs), and the CELL Broadband Engine
(CELL/B.E.).
Outline
• Introduction
• BELIEF PROPAGATION
• DATA STRUCTURES AND PARALLEL
COMPUTING MODELS
• PARALLELIZING THE KERNELS EXECUTION
• EXPERIMENTAL RESULTS
BELIEF PROPAGATION
• Belief propagation, also known as the SPA, is
an iterative algorithm for the computation of
joint probabilities
LDPC Decoding
• exploit probabilistic relationships between
nodes imposed by parity-check conditions
that allow inferring the most likely transmitted
codeword.
LDPC Decoding(cont.)
White Gaussian noise
LDPC Decoding(cont.)
Outline
• Introduction
• BELIEF PROPAGATION
• DATA STRUCTURES AND PARALLEL
COMPUTING MODELS
• PARALLELIZING THE KERNELS EXECUTION
• EXPERIMENTAL RESULTS
DATA STRUCTURES AND PARALLEL
COMPUTING MODELS
• compact data structures to represent the H
matrix
Data Structures
• separately code the information about H in
two independent data streams,
and
remind
• rmn :是CNm->BNn
• qnm :是BNn->CNm
Parallel Computational Models
• Parallel Features of the General-Purpose
Multicores
• Parallel Features of the GPU
• Parallel Features of the CELL/B.E.
Parallel Features of the GeneralPurpose Multicores
• #pragma omp parallel for
Parallel Features of the GPU
Throughput
Parallel Features of the CELL/B.E.
Throughput
Outline
• Introduction
• BELIEF PROPAGATION
• DATA STRUCTURES AND PARALLEL
COMPUTING MODELS
• PARALLELIZING THE KERNELS EXECUTION
• EXPERIMENTAL RESULTS
PARALLELIZING THE KERNELS
EXECUTION
• The Multicores Using OpenMP
• The GPU Using CUDA
• The CELL/B.E.
The Multicores Using OpenMP
The GPU Using CUDA
• Programming the Grid Using a Thread per
Node Approach
The GPU Using CUDA(cont.)
• Coalesced Memory Accesses
The CELL/B.E.
• Small Single-SPE Model(A B C)
• Large Single-SPE Model
Why Single-SPE Model
• In the single-SPE model, the number of
communications between PPE and SPEs is
minimum and the PPE is relieved from the
costly task of reorganizing data (sorting
procedure in Algorithm 4) between data
transfers to the SPE.
Outline
• Introduction
• BELIEF PROPAGATION
• DATA STRUCTURES AND PARALLEL
COMPUTING MODELS
• PARALLELIZING THE KERNELS EXECUTION
• EXPERIMENTAL RESULTS
EXPERIMENTAL RESULTS
• LDPC Decoding on the General-Purpose x86
Multicores Using OpenMP
• LDPC Decoding on the CELL/B.E.
– Small Single-SPE Model
– Large Single-SPE Model
• LDPC Decoding on the GPU Using CUDA
LDPC Decoding on the GeneralPurpose x86 Multicores Using
OpenMP
LDPC Decoding on the CELL/B.E.
LDPC Decoding on the CELL/B.E.(cont.)
LDPC Decoding on the CELL/B.E.(cont.)
LDPC Decoding on the GPU Using
CUDA
The end
Thank you~