Transcript 4C8
4C8 Dr. David Corrigan Jpeg and the DCT 2D DCT DCT Basis Functions 𝜇𝑘𝑙 (𝑥, 𝑦) 2D DCT Qstep = 15 Each band is the same size and there are 64 bands in total so the entropy is band entropies H 1.36 bits/pixel 64 Optimum Block Size is 8! Slow DCT • Sledgehammer implementation for 8 point DCT • Each row multiply requires 8 MADDs (approx) • So for all 8 rows requires 64 MADDs (approx) Fast DCT • Exploit Symmetry Fast DCT • So split Matrix T into two parts... y (1) y ( 2) y (3) y ( 4 ) y ( 5) y (6) y (7 ) y (8) Fast DCT • split Matrix T into two parts, change y... y (1) y (8) y ( 2) y ( 7 ) y (3) y (6) y ( 4 ) y ( 5 ) y (1) y (8) y ( 2) y ( 7 ) y (3) y (6) y ( 4 ) y ( 5 ) y (1) y ( 2) y (3) y ( 4 ) y ( 5) y (6) y (7 ) y (8) Fast DCT y (1) y (8) y ( 2) y ( 7 ) y (3) y (6) y ( 4 ) y ( 5 ) y (1) y (8) y ( 2) y ( 7 ) y (3) y (6) y ( 4 ) y ( 5 ) 4 “adds”, 16 MADDS for each operation = 8 adds and 32 MADDS = 40 ops Compare with 64 MADDS from before . y (1) y ( 2) y (3) y ( 4 ) y ( 5) y (6) y (7 ) y (8) Fast DCT y (1) y (8) y ( 2) y ( 7 ) y (3) y (6) y (4) y (5) This sub-matrix can be simplified with symmetry again! 4 “adds”, 8 MADDS in total = 12 ops (down from 20) So now we are at 20 (for the first sub matrix) + 12 (for these two) = 32 ops So we have saved about x2! JPEG and Colour Images • JPEG uses YCBCR colourspace. • The chrominance channels are usually downsampled. • There are 3 commonly used modes – 4:4:4 – no chrominance subsampling – 4:2:2 – Every 2nd column in the chrominance channels are dropped. – 4:2:0 – Every 2nd column and row is dropped. Subjectively Weighted Quantisation • In JPEG it is standard to apply different thresholds to different bands Qlum 16 12 14 14 18 24 49 72 11 10 16 24 40 51 12 14 19 26 58 60 13 16 24 40 57 69 17 22 29 51 87 80 22 37 56 68 109 103 35 55 64 81 104 113 64 78 87 103 121 120 92 95 98 112 100 103 61 55 56 62 77 92 101 99 Qchr 17 18 24 47 99 99 99 99 18 24 47 99 99 99 99 21 26 66 99 99 99 99 26 56 99 99 99 99 99 66 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 Subjectively Weighted Quantisation • These values are obtained by perceptual tests. • A user is asked to view an image of a particular size on at specified distance from the screen. – Usually a multiple of the screen height. • User is presented with an image and is asked to increase the gain of a given band until he/she just notices a difference in the image. I vis ( x, y ) I orig ( x, y ) kl kl ( x, y ) – Note typically a flat grey image is used to avoid masking effects caused by edges and texture • The set of 𝛾𝑘𝑙 form the quantisation matrix. Subjectively Weighted Quantisation Qlum 16 12 14 14 18 24 49 72 11 12 13 17 22 35 64 92 Qchr 17 18 24 47 99 99 99 99 18 21 26 66 99 99 99 99 • Lower Frequency Bands are assigned lower step sizes. • There is a slight drop of in step size from the DC coefficient to low frequency coefficients. • The step sizes for the chrominance channels increase faster than for luminance. 10 14 16 22 37 55 78 95 24 26 56 99 99 99 99 99 16 24 40 19 26 58 24 40 57 29 51 87 56 68 109 64 81 104 87 103 121 98 112 100 47 66 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 51 60 69 80 103 113 120 103 99 99 99 99 99 99 99 99 61 55 56 62 77 92 101 99 99 99 99 99 99 99 99 99 We have seen this before Comparing Different Quantisations JPEG Uncompressed Qstep = Qlum Comparing Different Quantisations Qstep = Qlum PSNR = 32.9 dB Comparing Different Quantisations JPEG Uncompressed Qstep = 2 * Qlum PSNR = 30.6 dB Comparing Different Quantisations Qstep = 15 Qstep = Qlum Qstep = 15 PSNR = 37.6 dB Comparing Different Quantisations Qstep = 30 Qstep = Qlum Qstep = 30 PSNR = 33.4 dB Comparing Different Quantisations Qstep = Qlum PSNR indicates better quality for Qstep = 30 over Qstep = Qlum Qstep = 30 but this clearly is not true from Qstep = 30 PSNR = 33.4 dB a subjective analysis. Comparing Different Quantisations Quantisation PSNR (dB) Subjective Ranking Entropy (bits/pel) 15 37.6 2 1.36 30 33.4 4 0.82 0.5 * Qlum 35.6 1 1.28 Qlum 32.9 3 0.86 2*Qlum 30.6 5 0.55 Using the subjectively weighted Quantisation achieves much higher levels of compression for equivalents levels of quality. JPEG Coding • The most obvious way might seem to code each band separately – ie. Huffman with RLC like we suggested with the Haar Transform. – We could get close to the entropy • This is not the way it is coded because – It would require 64 different codes. High cost in computation and storage of codebooks. – It ignores the fact that the zero coefficients occur at the same positions in multiple bands. JPEG Coding • Instead we code each block separately – A block contains 64 coefficients, one from each band. • Each block contains 1 DC coefficient (from the top left band) and 63 AC coefficients • Two codebooks are used in total for all the blocks, one for the DC coefficients and the other for the AC coefficients. • At the end of each Block we insert an End Of Block (EOB) symbol in the datastream Data Ordering • Each block covers is a 8x8 grid of coeffs – A Zig-Zag scan converts them into a 1D stream. – As most non-zero values occur in the top left corner using a Zig-Zag scan maximises the lengths zero runs so improves efficiency of RLC Zig-Zag Scan Example Non-Zero values are at the top left corner of the block 13 3 2 0 0 0 1 0 6 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Typical DCT Block Coefficients Zig-Zag scan concentrates the non-zero coefficients at the start of the stream -13, -3, 6, 0, 0, 2, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 36 more zeros, the end Coding the DC Coefficients Differential Coding Coding the DC Coefficients 13 3 2 0 0 0 1 0 6 0 0 0 0 0 0 0 This value is actually the difference between the dc coefficient of the current and previous blocks 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Typical DCT Block Coefficients Coding DC Coefficients • There is potentially a large number of levels to encode. – Up to 4096 depending on the quantization step size. • We break down the symbol value into a size index pair Coding DC Coefficients • So if the DC value is -13 – The size is 4 – The index is 0010 • In JPEG only the size is encoded using Huffman – The index is uncoded, efficiency is not dramatically affected. – Only 12 codes required in huffman table – Table size is 16 + 12 = 28 bytes More examples of Coefficient to size/index pair conversions Value Size Index -7 3 000 -6 3 001 -5 3 010 -4 3 011 -3 2 00 -2 2 01 -1 1 0 0 0 - 1 1 1 2 2 10 3 2 11 4 3 100 5 3 101 6 3 110 7 3 111 Coding the AC Coefficients Size/Index Pair for DC coefficient 13 3 2 0 0 0 1 0 6 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Typical DCT Block Coefficients The length of the run and the value of the coeff after it are strongly correlated 4 0010, -3, 6, 2 zeros, 2, 3 zeros, -1, 17 zeros, 1, 36 more zeros, the end The block usually ends with a long run of zeros Coding the AC Coefficients • Code/Size Correlations – High coeffs follow short runs and low coeffs follow long runs • Final run of zeros – These don’t need to be coded – Just tell the encoder that there are no more nonzero coefficients and move onto the next block. Symbols Run/Coefficient Symbols eg. 0, 0, 9 is a run of 2 zeros followed by a 9 However we represent 9 using the size/index format from the dc coeffs 9 has a size of 4 and an index 1001 So we code the run/size pair (2,4) and the index 1001 is appended to the stream Symbols • Run/Size Symbols – All possible combinations of runs from 0->15 and size from 1->10 – 160 total symbols – Huffman Codes are used for each symbol – Index values are not coded further Special Symbols • ZRL – Used to represent a run of 16 zeros – Used when the run of zeros is greater than 15 – Eg. 17 zeros, 14 - is coded as (ZRL) (1,4) 1110 • EOB – Inserted when a block ends with a run of zeros In total there are 160 run/size symbols and 2 special symbols 162 symbols to 2 encode codetable is 16 + 162 = 178 bytes Coding Example 13 3 2 0 0 0 1 0 6 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Typical DCT Block Coefficients -13, -3, 6, 2 zeros, 2, 3 zeros, -1, 17 zeros, 1, 36 more zeros, the end DC Coefficient is -13. The size is 4 and the index is 0010 Current Stream State: 4 0010 Coding Example -13, -3, 6, 2 zeros, 2, 3 zeros, -1, 17 zeros, 1, 36 more zeros, the end The first ac value is -3. That is a run of 0 zeros followed by -3. -3 has size 2 and index 0000 Therefore the run/size pair is (0,2) Current Stream State: 4 0010 (0,2) 00 Coding Example -13, -3, 6, 2 zeros, 2, 3 zeros, -1, 17 zeros, 1, 36 more zeros, the end The next ac value is 6. That is a run of 0 zeros followed by 6. 6 has size 3 and index 110 Therefore the run/size pair is (0,3) Current Stream State: 4 0010 (0,2) 00 (0,3) 110 Coding Example -13, -3, 6, 2 zeros, 2, 3 zeros, -1, 17 zeros, 1, 36 more zeros, the end The next ac value to encode is a run of 2 zeros followed by a ac coefficient 2. 2 has size 2 and index 10 Therefore the run/size pair is (2,2) Current Stream State: 4 0010 (0,2) 00 (0,3) 110 (2,2) 10 Coding Example -13, -3, 6, 2 zeros, 2, 3 zeros, -1, 17 zeros, 1, 36 more zeros, the end The next ac value to encode is a run of 3 zeros followed by a ac coefficient -1. -1 has size 1 and index 0 Therefore the run/size pair is (3,1) Current Stream State: 4 0010 (0,2) 00 (0,3) 110 (2,2) 10 (3,1) 0 Coding Example -13, -3, 6, 2 zeros, 2, 3 zeros, -1, 17 zeros, 1, 36 more zeros, the end The next ac value to encode is a run of 17 zeros followed by a ac coefficient 1. As the run is > 15 zeros we have to use the ZRL symbol to code the first 16 zeros. The remaining run length consists of (17 - 16) = 1 zero. An ac coefficient of 1 has size 1 and index 1 Therefore we insert the run/size pair (1,1) after the ZRL marker Current Stream State: 4 0010 (0,2) 00 (0,3) 110 (2,2) 10 (3,1) 0 ZRL (1,1) 1 Coding Example -13, -3, 6, 2 zeros, 2, 3 zeros, -1, 17 zeros, 1, 36 more zeros, the end The remaining coeffs are all 0. Therefore the EOB marker is used. If the last ac coeff is non-zero, then the EOB marker is not used. Current Stream State: 4 0010 (0,2) 00 (0,3) 110 (2,2) 10 (3,1) 0 ZRL (1,1) 1 EOB Huffman Coding No further encoding Final Stream: 4 0010 (0,2) 00 (0,3) 110 (2,2) 10 (3,1) 0 ZRL (1,1) 1 EOB Encoded using dc codetable Encoded using ac codetable • Best Solution is to define the 2 Huffman codes for each image during compression • However a default Huffman codetable is defined in the JPEG standard. Default Codetables DC table Final Stream: Fully Encoded Stream: AC table 4 0110 (0,2) 0000 (0,3) 110 (2,2) 10 (3,1) 0 ZRL (1,1) 1 EOB 101 0110 01 0000 100 110 11111001 10 111010 0 11111111001 1100 1 1010 56 bits to encode 64 coefficients = 0.875 bits/coefficient How good is this scheme? Should we use default codetables? Even though doubling the quantisation sizes reduces the number of events the distribution of those events doesn’t change much. Only the EOB probability changes significantly. Therefore using the same codetable for both cases is reasonable How good is this scheme? Efficiency when the default codetable is used 97.35% 95.74% In fact using the same codetable for multiple images doesn’t reduce the efficiency of the code much. Special Markers Synchronisation Markers • There are 8 synch markers FFD0 ->FFD7 They can be placed at intervals which can be specified by using the DRI (FFDD) marker Each marker is sent sequentially so if any marker is corrupted its absence can be easily detected. Summary • We have covered the basics of JPEG standard • The standard specifies a syntax rather than specifying exactly how it is implemented • Most implementations use the recommended settings provided by the JPEG community.