PowerPoint 簡報 - National Tsing Hua University

Download Report

Transcript PowerPoint 簡報 - National Tsing Hua University

Towards Efficient Wavefront Parallel
Encoding of HEVC: Parallelism
Analysis and Improvement
Keji Chen, Yizhou Duan, Jun Sun, Zongming Guo
2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)
Outline





Introduction
Parallelism Evaluation Of HEVC Encoding
Proposed Method
Experimental Results
Conclusion
2
Introduction


Great increment of computational complexity
introduced by the enhanced coding tools
makes HEVC difficult for application.
By developing the parallelism among the
encoding tasks, the encoding speed can be
significantly improved.
3
Introduction


Compared with slices, WPP can achieve
similar parallelism with less loss of coding
efficiency.
In [11], Chi et al. proposed an Overlapped
WaveFront (OWF) method based on WPP.
• [11] C. C. Chi, M. Alvarez-Mesa, B. Juurlink, G. Clare, F. Henry, S. Pateux, and T. Schierl,
“Parallel Scalability and Efficiency of HEVC Parallelization Approaches,” IEEE Trans.
Circuits Syst. Video Technol., vol. 22, pp.1827-1838, Dec. 2012
4
Parallelism Evaluation Of HEVC
Encoding(1/3)

Ti,j,k : Self Encoding Complexity (SEC) of Ci,j,k.



SEC can be evaluated by the encoding time.
Determined by the frame content and RDO design
and does not change with parallel methods.
ETF(Ci,j,k) : Required Encoding Complexity
(REC) to encode Ci,j,k using parallel method F.


REC can be regarded as the earliest ending time.
Affected by the data dependence.
5
Parallelism Evaluation Of HEVC
Encoding(2/3)


𝐸𝑇𝐹 𝐶𝑖,𝑗,𝑘 = 𝑇𝑖,𝑗,𝑘 +
max{𝐸𝑇𝐹 (𝐶𝑖1,𝑗1,𝑘1 )|𝐶𝑖1,𝑗1,𝑘1 ∈
𝐷𝐸𝑃𝐹,𝑖𝑛𝑡𝑟𝑎 (𝐶𝑖,𝑗,𝑘 ) ∪ 𝐷𝐸𝑃𝐹,𝑖𝑛𝑡𝑒𝑟 (𝐶𝑖,𝑗,𝑘 )}
𝑃𝑎𝑟𝑎𝑙𝑙𝑒𝑙𝑖𝑠𝑚𝐹 =
𝑇𝑖,𝑗,𝑘
max{𝐸𝑇𝐹 (𝐶𝑖,𝑗,𝑘 )}
(1)
(2)
• i, j, k : order of frame, line, and CTU.
• DEPF,inter(Ci,j,k) : CTBs that Ci,j,k depends on when using parallel encoding method F.
6
Parallelism Evaluation Of HEVC
Encoding(3/3)


From (1) and (2), it is clear that the parallelism of
different parallel methods can be evaluated:
𝐷𝐸𝑃𝐹1,𝑖𝑛𝑡𝑟𝑎 (𝐶𝑖,𝑗,𝑘 ) ∪ 𝐷𝐸𝑃𝐹1,𝑖𝑛𝑡𝑒𝑟 (𝐶𝑖,𝑗,𝑘 ) ⊆
𝐷𝐸𝑃𝐹2,𝑖𝑛𝑡𝑟𝑎 (𝐶𝑖,𝑗,𝑘 ) ∪ 𝐷𝐸𝑃𝐹2,𝑖𝑛𝑡𝑒𝑟 (𝐶𝑖,𝑗,𝑘 )}
(3)
𝑃𝑎𝑟𝑎𝑙𝑙𝑒𝑙𝑖𝑠𝑚𝐹1 ≥ 𝑃𝑎𝑟𝑎𝑙𝑙𝑒𝑙𝑖𝑠𝑚𝐹2
(4)
This criterion is easy to be proved with (1) and (2)
and can be simply explained as the less
dependence in HEVC encoding, the higher
parallelism can be obtained.
7
Data Dependence Analysis of WPP
and OWF Method(1/4)

For intra :
𝐷𝐸𝑃𝑊𝑃𝑃,𝑖𝑛𝑡𝑟𝑎 𝐶𝑖,𝑗,𝑘 = 𝐷𝐸𝑃𝑂𝑊𝐹,𝑖𝑛𝑡𝑒𝑟 𝐶𝑖,𝑗,𝑘 =
{𝐶𝑖,𝑗−1,𝑘−1 , 𝐶𝑖,𝑗−1,𝑘 , 𝐶𝑖,𝑗−1,𝑘+1 , 𝐶𝑖,𝑗,𝑘−1 }
(5)
8
Data Dependence Analysis of WPP
and OWF Method(2/4)



SEC of each CTB is of significant difference.
Variance of the SEC in inter frame is much
greater than that of intra frame.
Under the given encoding algorithm, the
unbalanced SEC is determined, thus being the
bottleneck of intra-frame parallelism.
9
Data Dependence Analysis of WPP
and OWF Method(3/4)
10
Data Dependence Analysis of WPP
and OWF Method(4/4)

For inter :
𝐷𝐸𝑃𝑊𝑃𝑃,𝑖𝑛𝑡𝑒𝑟 𝐶𝑖,𝑗,𝑘 = {𝐶𝑖1 ,𝑗1,𝑘1 |𝑖1 < 𝑖} ,
(6)
𝐷𝐸𝑃𝑂𝑊𝐹,𝑖𝑛𝑡𝑒𝑟 𝐶𝑖,𝑗,𝑘 = {𝐶𝑖−1,𝑗+𝐿_𝑂𝑊𝐹,𝑊−1 |𝑖 > 0}, (7)
• i, j, k : order of frame, line, and CTU.
• W : the width of a frame measured by CTB.
• L_OWF : a positive integer parameter denoting the safe range.
• In [11], L_OWF is roughly set to the upper round of 1/4 height of a frame
measured by CTB.
11
Proposed Method(1/5)

To best exploit the inter-frame parallelism, we
designed a new Inter-frame Wavefront (IFW)
coding order.
12
Proposed Method(2/5)

For intra :
𝐷𝐸𝑃𝐼𝐹𝑊,𝑖𝑛𝑡𝑟𝑎 𝐶𝑖,𝑗,𝑘 = 𝐷𝐸𝑃𝑊𝑃𝑃,𝑖𝑛𝑡𝑟𝑎 𝐶𝑖,𝑗,𝑘 ,

(8)
For inter :
𝐷𝐸𝑃𝑂𝑊𝐹,𝑖𝑛𝑡𝑒𝑟 𝐶𝑖,𝑗,𝑘 =
𝐶𝑖,𝑗+𝐿𝐼𝐹𝑊 ,𝑊−1 𝑓𝑟𝑎𝑚𝑒 𝑖1 𝑖𝑠 𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒𝑑 𝑏𝑦 𝑓𝑟𝑎𝑚𝑒 𝑖} (9)
13
Proposed Method(3/5)


Frame Thread (FT) is assigned to each frame
to develop inter-frame parallelism.
Wavefront Thread (WT) is assigned to each
frame to develop intra-frame parallelism.
14
Proposed Method(4/5)

If L_IFW is no greater than L_OWF, for any i,
j, k we can deduce that:
𝐷𝐸𝑃𝐼𝐹𝑊,𝑖𝑛𝑡𝑒𝑟 (𝐶𝑖,𝑗,𝑘 ) ⊆ 𝐷𝐸𝑃𝑂𝑊𝐹,𝑖𝑛𝑡𝑒𝑟 (𝐶𝑖,𝑗,𝑘 ) ⊆
𝐷𝐸𝑃𝑊𝑃𝑃,𝑖𝑛𝑡𝑒𝑟 (𝐶𝑖,𝑗,𝑘 ),
(12)
𝑃𝑎𝑟𝑎𝑙𝑙𝑒𝑙𝑖𝑠𝑚𝐼𝐹𝑊 ≥ 𝑃𝑎𝑟𝑎𝑙𝑙𝑒𝑙𝑖𝑠𝑚𝑂𝑊𝐹 ≥ 𝑃𝑎𝑟𝑎𝑙𝑙𝑒𝑙𝑖𝑠𝑚𝑊𝑃𝑃 ,(13)
15
Proposed Method(5/5)


It is also confirmed that the unbalanced SEC is
a bottleneck for intra-frame parallelism.
Parallelism of IFW significantly increases as
B-frames increase, because the effectively
reduced inter-frame dependence makes much
greater contribution in improving the overall
parallelism.
16
Experimental Results


The common test conditions and software
reference configurations [12].
The hardware platform is a shared memory
system with two AMD Opteron 6272
processors.
17
Experimental Results(2/)
18
Experimental Results

Frame Thread = 9, Wavefront Thread = 8
19
20
x265
21
Conclusion


A parallelism evaluation criterion and an IFW
method are proposed to improve the encoding
speed of HEVC.
IFW method achieves significant speedup on
various sequences, being a promising
technology for large-scale HEVC video
applications.
22