Parallel Scalability and Efficiency of HEVC

Download Report

Transcript Parallel Scalability and Efficiency of HEVC

Parallel Scalability and Efficiency of
HEVC Parallelization Approaches
Chi Ching Chi, Mauricio Alvarez-Mesa,, Ben
Juurlink, Gordon Clare, F´elix Henry, St´ephane
Pateux and Thomas Schierl
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS
FOR VIDEO TECHNOLOGY
Outline
•
•
•
•
•
Introduction
Video codec parallelization approaches
Coding efficiency analysis
Experimental evaluation
Conclusions
Introduction
• While the single-core processor can decode a
1080p H.264/AVC video in real-time, it is very
unlikely that processor performance will
decode a 2160p50 HEVC video in real-time.
• To obtain real-time HEVC decoding
performance, parallelism is no longer an
option but a necessity.
Introduction
• H.264/AVC supports slice parallelization.
• It may not achieve real-time if it receives a
video with one or a few slices per frame.
• The main parallelization approaches currently
included in the HEVC draft (Tiles and
Wavefront Parallel Processing[WPP]).
• This paper presents a approach called
Overlapped Wavefront(OWF).
Previous parallelization strategies
• Frame-level parallelism
• Slice-level parallelism
• Macroblock-level parallelism
Frame-level parallelism
• Frame-level parallelism consists of processing
multiple frames at the same time.
• Frame-level parallelism is sufficient for
multicore systems with just a few cores.
• If due to fast motion, motion vectors are long,
there is little parallelism.
Slice-level Parallelism
• Each frame can be partitioned into one or
more slices.
• Slices in a frame are completely independent
from each other and therefore they can also
be used for parallel processing.
• It is useful for a frame with a few slices but not
one slice per frame.
Macroblock-level Parallelism
Parallelization Strategies in HEVC
• Tiles
• Wavefront Parallel Processing (WPP)
• Overlapped Wavefront (OWF)
Tiles
Tiles
• The number of tiles and the location of their
boundaries can be defined for the entire
sequence or changed from picture to picture.
• Compared to slices, Tiles have a better coding
efficiency.
• The rate-distortion loss increases with the
number of tiles.
Wavefront Parallel Processing (WPP)
Overlapped Wavefront (OWF)
• When a thread has finished a CTB row in the
current picture and no more rows are
available it can start processing the next
picture instead of waiting for the current
picture to finish.
• The support this approach, the motion vector
is contrained to ¼ of picture height.
Overlapped Wavefront (OWF)
Coding efficiency analysis
Coding efficiency analysis
Experimental evaluation
• Environment
Experimental evaluation
Experimental evaluation
Experimental evaluation
Experimental evaluation
Conclusions
• We present a detailed performance
comparison of the main approaches, namely
WPP ,Tiles and OWF.
• Tiles performance 7% higher than WPP on
average at 12 cores.
• The proposed OWF 28% higher on average
than Tiles.
• Achieve real-time performance for 1080p50
videos, but “only” 25.4 fps for 2160p.