Transcript 4C8

4C8
Dr. David Corrigan
Jpeg and the DCT
2D DCT
DCT Basis Functions
𝜇𝑘𝑙 (𝑥, 𝑦)
2D DCT
Qstep = 15
Each band is the same size and there are 64 bands in total so the entropy is
band entropies

H
 1.36 bits/pixel
64
Optimum Block Size is 8!
Slow DCT
• Sledgehammer implementation for 8 point DCT
• Each row multiply requires 8 MADDs (approx)
• So for all 8 rows requires 64 MADDs (approx)
Fast DCT
• Exploit Symmetry
Fast DCT
• So split Matrix T into two parts...
 y (1) 
 y ( 2) 


 y (3) 


y
(
4
)


 y ( 5) 


 y (6) 
 y (7 ) 


 y (8) 
Fast DCT
• split Matrix T into two parts, change y...
 y (1)  y (8) 
 y ( 2)  y ( 7 ) 


 y (3)  y (6) 


y
(
4
)

y
(
5
)


 y (1)  y (8) 
 y ( 2)  y ( 7 ) 


 y (3)  y (6) 


y
(
4
)

y
(
5
)


 y (1) 
 y ( 2) 


 y (3) 


y
(
4
)


 y ( 5) 


 y (6) 
 y (7 ) 


 y (8) 
Fast DCT
 y (1)  y (8) 
 y ( 2)  y ( 7 ) 


 y (3)  y (6) 


y
(
4
)

y
(
5
)


 y (1)  y (8) 
 y ( 2)  y ( 7 ) 


 y (3)  y (6) 


y
(
4
)

y
(
5
)


4 “adds”, 16 MADDS for each operation = 8 adds and 32 MADDS = 40 ops
Compare with 64 MADDS from before .
 y (1) 
 y ( 2) 


 y (3) 


y
(
4
)


 y ( 5) 


 y (6) 
 y (7 ) 


 y (8) 
Fast DCT
 y (1)  y (8) 
 y ( 2)  y ( 7 ) 


 y (3)  y (6) 


 y (4)  y (5) 
This sub-matrix can be simplified with symmetry again!
4 “adds”, 8 MADDS in total = 12 ops (down from 20)
So now we are at 20 (for the first sub matrix) + 12 (for these two) = 32 ops
So we have saved about x2!
JPEG and Colour Images
• JPEG uses YCBCR colourspace.
• The chrominance channels are usually
downsampled.
• There are 3 commonly used modes
– 4:4:4 – no chrominance subsampling
– 4:2:2 – Every 2nd column in the chrominance
channels are dropped.
– 4:2:0 – Every 2nd column and row is dropped.
Subjectively Weighted Quantisation
• In JPEG it is standard to apply different thresholds to different
bands
Qlum
 16

 12
 14

 14

 18
 24

 49
 72

11 10 16
24
40
51
12 14 19
26
58
60
13 16 24
40
57
69
17 22 29
51
87
80
22 37 56
68
109 103
35 55 64
81
104 113
64 78 87 103 121 120
92 95 98 112 100 103
61 

55 
56 

62 
77 
92 

101
99 
Qchr
 17

 18
 24

 47

 99
 99

 99
 99

18 24 47 99 99 99 99 

21 26 66 99 99 99 99 
26 56 99 99 99 99 99 

66 99 99 99 99 99 99 
99 99 99 99 99 99 99 
99 99 99 99 99 99 99 

99 99 99 99 99 99 99 
99 99 99 99 99 99 99 
Subjectively Weighted Quantisation
• These values are obtained by perceptual tests.
• A user is asked to view an image of a particular size on at specified
distance from the screen.
– Usually a multiple of the screen height.
• User is presented with an image and is asked to increase the gain of a
given band until he/she just notices a difference in the image.
I vis ( x, y )  I orig ( x, y )   kl  kl ( x, y )
– Note typically a flat grey image is used to avoid masking effects caused by edges
and texture
• The set of 𝛾𝑘𝑙 form the quantisation matrix.
Subjectively Weighted Quantisation
Qlum
 16

 12
 14

 14

 18
 24

 49
 72

11
12
13
17
22
35
64
92
Qchr
 17

 18
 24

 47

 99
 99

 99
 99

18
21
26
66
99
99
99
99
• Lower Frequency Bands are
assigned lower step sizes.
• There is a slight drop of in step
size from the DC coefficient to
low frequency coefficients.
• The step sizes for the
chrominance channels increase
faster than for luminance.
10
14
16
22
37
55
78
95
24
26
56
99
99
99
99
99
16 24 40
19 26 58
24 40 57
29 51 87
56 68 109
64 81 104
87 103 121
98 112 100
47
66
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
51
60
69
80
103
113
120
103
99
99
99
99
99
99
99
99
61 

55 
56 

62 
77 
92 

101
99 
99

99
99

99
99
99

99
99
We have seen this before
Comparing Different Quantisations
JPEG
Uncompressed
Qstep = Qlum
Comparing Different Quantisations
Qstep = Qlum
PSNR = 32.9 dB
Comparing Different Quantisations
JPEG
Uncompressed
Qstep = 2 * Qlum
PSNR = 30.6 dB
Comparing Different Quantisations
Qstep = 15
Qstep = Qlum
Qstep = 15
PSNR = 37.6 dB
Comparing Different Quantisations
Qstep = 30
Qstep = Qlum
Qstep = 30
PSNR = 33.4 dB
Comparing Different Quantisations
Qstep = Qlum
PSNR indicates better quality
for Qstep = 30 over Qstep = Qlum Qstep = 30
but this clearly is not true from
Qstep = 30
PSNR = 33.4 dB
a subjective
analysis.
Comparing Different Quantisations
Quantisation
PSNR (dB)
Subjective
Ranking
Entropy
(bits/pel)
15
37.6
2
1.36
30
33.4
4
0.82
0.5 * Qlum
35.6
1
1.28
Qlum
32.9
3
0.86
2*Qlum
30.6
5
0.55
Using the subjectively weighted Quantisation achieves much
higher levels of compression for equivalents levels of quality.
JPEG Coding
• The most obvious way might seem to code each band
separately
– ie. Huffman with RLC like we suggested with the Haar Transform.
– We could get close to the entropy
• This is not the way it is coded because
– It would require 64 different codes. High cost in computation and
storage of codebooks.
– It ignores the fact that the zero coefficients occur at the same
positions in multiple bands.
JPEG Coding
• Instead we code each block separately
– A block contains 64 coefficients, one from each band.
• Each block contains 1 DC coefficient (from the top left band)
and 63 AC coefficients
• Two codebooks are used in total for all the blocks, one for the
DC coefficients and the other for the AC coefficients.
• At the end of each Block we insert an End Of Block (EOB)
symbol in the datastream
Data Ordering
• Each block covers is a 8x8
grid of coeffs
– A Zig-Zag scan converts them into
a 1D stream.
– As most non-zero values occur in
the top left corner using a Zig-Zag
scan maximises the lengths zero
runs so improves efficiency of
RLC
Zig-Zag Scan Example
Non-Zero values are at
the top left corner of the
block
 13  3 2 0 0 0 1 0
6
0 0 0 0 0 0 0
0
1
0
0
0 0 0 0 0 0
0 0 0 0 0 0
0
0
0
0
0
0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
Typical DCT Block
Coefficients
Zig-Zag scan concentrates
the non-zero coefficients
at the start of the stream
-13, -3, 6, 0, 0, 2, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 36 more zeros, the
end
Coding the DC Coefficients
Differential Coding
Coding the DC Coefficients
 13  3 2 0 0 0 1 0
6
0 0 0 0 0 0 0
This value is actually the
difference between the
dc coefficient of the
current and previous
blocks
0
1
0
0
0
0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0
0
0
0
0 0 0 0 0 0
0 0 0 0 0 0
Typical DCT Block
Coefficients
Coding DC Coefficients
• There is potentially a large number of levels to encode.
– Up to 4096 depending on the quantization step size.
• We break down the symbol value into a size index pair
Coding DC Coefficients
• So if the DC value is -13
– The size is 4
– The index is 0010
• In JPEG only the size is encoded using Huffman
– The index is uncoded, efficiency is not dramatically
affected.
– Only 12 codes required in huffman table
– Table size is 16 + 12 = 28 bytes
More examples of Coefficient to size/index pair conversions
Value
Size
Index
-7
3
000
-6
3
001
-5
3
010
-4
3
011
-3
2
00
-2
2
01
-1
1
0
0
0
-
1
1
1
2
2
10
3
2
11
4
3
100
5
3
101
6
3
110
7
3
111
Coding the AC Coefficients
Size/Index Pair for
DC coefficient
 13  3 2 0 0 0 1 0
6
0 0 0 0 0 0 0
0
1
0
0
0 0 0 0 0 0
0 0 0 0 0 0
0
0
0
0
0
0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
Typical DCT Block
Coefficients
The length of the run and the
value of the coeff after it are
strongly correlated
4 0010, -3, 6, 2 zeros, 2, 3 zeros, -1, 17 zeros,
1, 36 more zeros, the end
The block usually ends
with a long run of zeros
Coding the AC Coefficients
• Code/Size Correlations
– High coeffs follow short runs and low coeffs follow
long runs
• Final run of zeros
– These don’t need to be coded
– Just tell the encoder that there are no more nonzero coefficients and move onto the next block.
Symbols
Run/Coefficient Symbols
eg. 0, 0, 9 is a run of 2 zeros followed by a 9
However we represent 9 using the size/index format from the dc
coeffs
9 has a size of 4 and an index 1001
So we code the run/size pair (2,4) and the index 1001 is
appended to the stream
Symbols
• Run/Size Symbols
– All possible combinations of runs from 0->15 and
size from 1->10
– 160 total symbols
– Huffman Codes are used for each symbol
– Index values are not coded further
Special Symbols
• ZRL
– Used to represent a run of 16 zeros
– Used when the run of zeros is greater than 15
– Eg. 17 zeros, 14 - is coded as (ZRL) (1,4) 1110
• EOB
– Inserted when a block ends with a run of zeros
In total there are 160 run/size symbols and 2 special symbols
162 symbols to 2 encode
codetable is 16 + 162 = 178 bytes
Coding Example
 13  3 2 0 0 0 1 0
6
0 0 0 0 0 0 0
0
1
0
0
0 0 0 0 0 0
0 0 0 0 0 0
0
0
0
0
0
0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
Typical DCT Block
Coefficients
-13, -3, 6, 2 zeros, 2, 3 zeros, -1, 17 zeros, 1,
36 more zeros, the end
DC Coefficient is -13. The size is 4 and the index is
0010
Current Stream
State:
4 0010
Coding Example
-13, -3, 6, 2 zeros, 2, 3 zeros, -1, 17 zeros, 1, 36 more zeros, the end
The first ac value is -3. That is a run of 0 zeros followed by -3.
-3 has size 2 and index 0000
Therefore the run/size pair is (0,2)
Current Stream
State:
4 0010 (0,2) 00
Coding Example
-13, -3, 6, 2 zeros, 2, 3 zeros, -1, 17 zeros, 1, 36 more zeros, the end
The next ac value is 6. That is a run of 0 zeros followed by 6.
6 has size 3 and index 110
Therefore the run/size pair is (0,3)
Current Stream
State:
4 0010 (0,2) 00 (0,3) 110
Coding Example
-13, -3, 6, 2 zeros, 2, 3 zeros, -1, 17 zeros, 1, 36 more zeros, the end
The next ac value to encode is a run of 2 zeros followed by a ac coefficient 2.
2 has size 2 and index 10
Therefore the run/size pair is (2,2)
Current Stream
State:
4 0010 (0,2) 00 (0,3) 110 (2,2) 10
Coding Example
-13, -3, 6, 2 zeros, 2, 3 zeros, -1, 17 zeros, 1, 36 more zeros, the end
The next ac value to encode is a run of 3 zeros followed by a ac coefficient -1.
-1 has size 1 and index 0
Therefore the run/size pair is (3,1)
Current Stream
State:
4 0010 (0,2) 00 (0,3) 110 (2,2) 10 (3,1) 0
Coding Example
-13, -3, 6, 2 zeros, 2, 3 zeros, -1, 17 zeros, 1, 36 more zeros, the end
The next ac value to encode is a run of 17 zeros followed by a ac coefficient 1.
As the run is > 15 zeros we have to use the ZRL symbol to code the first 16 zeros. The
remaining run length consists of (17 - 16) = 1 zero.
An ac coefficient of 1 has size 1 and index 1
Therefore we insert the run/size pair (1,1) after the ZRL marker
Current Stream
State:
4 0010 (0,2) 00 (0,3) 110 (2,2) 10 (3,1) 0 ZRL (1,1) 1
Coding Example
-13, -3, 6, 2 zeros, 2, 3 zeros, -1, 17 zeros, 1, 36 more zeros, the end
The remaining coeffs are all 0. Therefore the EOB marker is used.
If the last ac coeff is non-zero, then the EOB marker is not used.
Current Stream
State:
4 0010 (0,2) 00 (0,3) 110 (2,2) 10 (3,1) 0 ZRL (1,1) 1 EOB
Huffman Coding
No further encoding
Final Stream:
4 0010 (0,2) 00 (0,3) 110 (2,2) 10 (3,1) 0 ZRL (1,1) 1 EOB
Encoded using dc
codetable
Encoded using ac
codetable
• Best Solution is to define the 2 Huffman codes for each image
during compression
• However a default Huffman codetable is defined in the JPEG
standard.
Default Codetables
DC table
Final Stream:
Fully Encoded
Stream:
AC table
4 0110 (0,2) 0000 (0,3) 110 (2,2) 10 (3,1) 0 ZRL (1,1) 1 EOB
101 0110 01 0000 100 110 11111001 10 111010 0
11111111001 1100 1 1010
56 bits to encode 64 coefficients = 0.875 bits/coefficient
How good is this scheme?
Should we use default codetables?
Even though doubling the quantisation sizes reduces the number of events the
distribution of those events doesn’t change much. Only the EOB probability changes
significantly.
Therefore using the same codetable for both cases is reasonable
How good is this scheme?
Efficiency when the
default codetable is
used
97.35%
95.74%
In fact using the same codetable for multiple images doesn’t
reduce the efficiency of the code much.
Special Markers
Synchronisation Markers
• There are 8 synch markers
FFD0 ->FFD7
They can be placed at intervals which can be
specified by using the DRI (FFDD) marker
Each marker is sent sequentially so if any marker
is corrupted its absence can be easily detected.
Summary
• We have covered the basics of JPEG standard
• The standard specifies a syntax rather than
specifying exactly how it is implemented
• Most implementations use the recommended
settings provided by the JPEG community.