Introduction to Wavelet Transform

Download Report

Transcript Introduction to Wavelet Transform

Introduction to Wavelet
Transform
Time Series are Ubiquitous!
A random sample of 4,000 graphics from 15
of the world’s newspapers published from
1974 to 1989 found that more than 75% of
all graphics were time series (Tufte, 1983).
Why is Working With Time Series so
Difficult?
Answer: We are dealing with subjective notions of
similarity.
The definition of similarity depends on the user, the domain and
the task at hand. We need to be able to handle this subjectivity.
Wavelet Transform - Overview
History
• Fourier (1807)
• Haar (1910)
• Math World
{ i }i
f  
i i
i
 t 
 mn  t   2
 t  2n 
 m 
 2 
m / 2
Wavelet Transform - Overview
• What kind of   t Could be useful?
– Impulse Function (Haar): Best time resolution
– Sinusoids (Fourier): Best frequency resolution
– We want both of the best resolutions
• Heisenberg (1930)
– Uncertainty Principle
• There is a lower bound for
t  
(An intuitive prove in [Mac91])
t

Wavelet Transform - Overview
• Gabor (1945)
– Short Time Fourier Transform (STFT)
• Disadvantage: Fixed window size
Wavelet Transform - Overview
• Constructing Wavelets
– Daubechies (1988)
• Compactly Supported Wavelets
• Computation of WT Coefficients
– Mallat (1989)
• A fast algorithm using filter banks
ha (k )
2
w 1A
2
g a (k )

S
hd (k )
2
w 1D
2
g d (k )
S
Discrete Fourier
Transform I
X
Basic Idea: Represent the time
series as a linear combination of
sines and cosines, but keep only the
first n/2 coefficients.
X'
0
20
40
60
80
100
120
140
Why n/2 coefficients? Because each
sine wave requires 2 numbers, for the
phase (w) and amplitude (A,B).
Jean Fourier
0
1768-1830
1
2
3
4
n
C (t )   ( Ak cos(2wk t )  Bk sin(2wk t ))
k 1
5
6
7
8
9
Excellent free Fourier Primer
Hagit Shatkay, The Fourier Transform - a Primer'', Technical Report CS95-37, Department of Computer Science, Brown University, 1995.
http://www.ncbi.nlm.nih.gov/CBBresearch/Postdocs/Shatkay/
Discrete Fourier
Transform II
X
X'
0
20
40
60
80
100
120
140
Pros and Cons of DFT as a time series
representation.
• Good ability to compress most natural signals.
• Fast, off the shelf DFT algorithms exist. O(nlog(n)).
• (Weakly) able to support time warped queries.
0
1
2
3
• Difficult to deal with sequences of different lengths.
• Cannot support weighted distance measures.
4
5
6
7
8
9
Note: The related transform DCT, uses only cosine
basis functions. It does not seem to offer any
particular advantages over DFT.
History…
In late forties, stalwarts like John von
Neumann, Dennis Gabor, Leon Brilloux
Eugen Wigner addressed the shortcomings
of Fourier analysis and advocated for a
windowed Fourier transform. In 1946
[Theory of Communication J. IEEE
93(1946), 429-457] introduced the concept
short-time Fourier transform now known as
Wiondowed Fourier Transform or Gabor
Transform; for current development see
Mallat [Wavelet Tour of Signal Process,
Academic Press 1998]. The computational
problems were realized by Gabor himself
and his coworkers von Neumann et al.
Discrete Wavelet
Transform I
X
Basic Idea: Represent the time
series as a linear combination of
Wavelet basis functions, but keep
only the first N coefficients.
X'
DWT
0
20
40
60
80
100
120
140
Haar 0
Although there are many different
types of wavelets, researchers in
time series mining/indexing
generally use Haar wavelets.
Alfred Haar
1885-1933
Haar 1
Haar 2
Haar 3
Haar wavelets seem to be as
powerful as the other wavelets for
most problems and are very easy to
code.
Haar 4
Haar 5
Excellent free Wavelets Primer
Haar 6
Haar 7
Stollnitz, E., DeRose, T., & Salesin, D. (1995). Wavelets for
computer graphics A primer: IEEE Computer Graphics and
Applications.
Wavelet Series
 (t )  (1  2t 2 )e  t is a wavelet as it is the
2
second derivative of the Gaussian
2
e  x / 2 with minus sign. It is known as the
Mexican Hat Wavelet.
1
¡ 4
0
4
Figure: Mexican Hat Wavelet
Discrete Wavelet
Transform III
Pros and Cons of Wavelets as a time series
representation.
X
X'
DWT
0
20
40
60
80
100
120
140
• Good ability to compress stationary signals.
• Fast linear time algorithms for DWT exist.
• Able to support some interesting non-Euclidean
similarity measures.
Haar 0
Haar 1
Haar 2
Haar 3
Haar 4
Haar 5
Haar 6
Haar 7
• Signals must have a length n = 2some_integer
• Works best if N is = 2some_integer. Otherwise wavelets
approximate the left side of signal at the expense of the right side.
• Cannot support weighted distance measures.
Singular Value
Decomposition I
X
Basic Idea: Represent the time
series as a linear combination of
eigenwaves but keep only the first
N coefficients.
X'
SVD
0
20
40
60
80
100
120
140
eigenwave 0
eigenwave 1
SVD is similar to Fourier and
Wavelet approaches, we represent
the data in terms of a linear
combination of shapes (in this case
eigenwaves).
James Joseph Sylvester
1814-1897
eigenwave 2
eigenwave 3
SVD differs in that the eigenwaves
are data dependent.
Camille Jordan
(1838--1921)
eigenwave 4
eigenwave 5
eigenwave 6
eigenwave 7
SVD has been successfully used in the text
processing community (where it is known as
Latent Symantec Indexing ) for many years.
Good free SVD Primer
Singular Value Decomposition - A Primer.
Sonia Leach
Eugenio Beltrami
1835-1899
Singular Value
Decomposition II
How do we create the eigenwaves?
We have previously seen that
we can regard time series as
points in high dimensional
space.
X
X'
SVD
0
20
40
60
80
100
120
140
We can rotate the axes such
that axis 1 is aligned with the
direction of maximum
variance, axis 2 is aligned with
the direction of maximum
variance orthogonal to axis 1
etc.
eigenwave 0
eigenwave 1
eigenwave 2
eigenwave 3
Since the first few eigenwaves
contain most of the variance of
the signal, the rest can be
truncated with little loss.
eigenwave 4
eigenwave 5
eigenwave 6
eigenwave 7
A  UV
T
This process can be achieved by factoring a M
by n matrix of time series into 3 other matrices,
and truncating the new matrices at size N.
Singular Value
Decomposition III
Pros and Cons of SVD as a time series
representation.
X
X'
SVD
0
20
40
60
80
100
120
140
• Optimal linear dimensionality reduction technique .
• The eigenvalues tell us something about the
underlying structure of the data.
eigenwave 0
eigenwave 1
eigenwave 2
eigenwave 3
eigenwave 4
• Computationally very expensive.
• Time: O(Mn2)
• Space: O(Mn)
• An insertion into the database requires recomputing
the SVD.
• Cannot support weighted distance measures or non
Euclidean measures.
eigenwave 5
eigenwave 6
eigenwave 7
Note: There has been some promising research into
mitigating SVDs time and space complexity.
Piecewise Linear
Approximation I
Basic Idea: Represent the time
series as a sequence of straight
lines.
X
Karl Friedrich Gauss
X'
1777 - 1855
0
20
40
60
80
100
120
140
Lines could be connected, in
which case we are allowed
N/2 lines
Each line segment has
• length
• left_height
(right_height can
be inferred by looking at
the next segment)
If lines are disconnected, we
are allowed only N/3 lines
Personal experience on dozens of datasets
suggest disconnected is better. Also only
disconnected allows a lower bounding
Euclidean approximation
Each line segment has
• length
• left_height
• right_height
Problem with Fourier
Fourier analysis -- breaks down a signal into constituent
sinusoids of different frequencies.
A serious drawback in transforming to the frequency
domain, time information is lost. When looking at a Fourier
transform of a signal, it is impossible to tell when a
particular event took place.
Function Representations
• sequence of samples (time domain)
– finite difference method
• pyramid (hierarchical)
• polynomial
• sinusoids of various frequency (frequency
domain)
– Fourier series
• piecewise polynomials (finite support)
– finite element method, splines
What Are Wavelets?
In general, a family of representations using:
• hierarchical (nested) basis functions
• finite (“compact”) support
• basis functions often orthogonal
• fast transforms, often linear-time
Function Representations –
Desirable Properties
• generality – approximate anything well
– discontinuities, nonperiodicity, ...
• adaptable to application
– audio, pictures, flow field, terrain data, ...
• compact – approximate function with few
coefficients
– facilitates compression, storage, transmission
• fast to compute with
Wavelet History, Part 1
• 1805 Fourier analysis developed
• 1965 Fast Fourier Transform (FFT) algorithm
…
• 1980’s beginnings of wavelets in physics,
vision, speech processing (ad hoc)
• … little theory … why/when do wavelets work?
• 1986 Mallat unified the above work
• 1985 Morlet & Grossman continuous wavelet
transform … asking: how can you get perfect
reconstruction without redundancy?
Wavelet History, Part 2
• 1985 Meyer tried to prove that no
orthogonal wavelet other than Haar exists,
found one by trial and error!
• 1987 Mallat developed multiresolution
theory, DWT, wavelet construction
techniques (but still noncompact)
• 1988 Daubechies added theory: found
compact, orthogonal wavelets with arbitrary
number of vanishing moments!
Time-Frequency Analysis
• For many applications, you want to analyze
a function in both time and frequency
• Analogous to a musical score
• Fourier transforms give you frequency
information, smearing time.
• Samples of a function give you temporal
information, smearing frequency.
Comparison to Fourier Analysis
• Fourier analysis
– Basis is global
– Sinusoids with frequencies in arithmetic progression
• Short-time Fourier Transform (& Gabor filters)
– Basis is local
– Sinusoid times Gaussian
– Fixed-width Gaussian “window”
• Wavelet
– Basis is local
– Frequencies in geometric progression
– Basis has constant shape independent of scale
Wavelets are faster than ffts!
Fast Fourier Transform (FFT) (1966)
Highly cited work: Cooley-Tukey
Computation hazard: O[N log N)
Fast Wavelet Transform:
Computational Hazard: O(N).
For details, see Mallat [Signal Poc. Book,
Academic Press, 1998] or Neunzert and
Siddiqi [Topics in Industrial Math – Kluwer
Academic Publishers, 2000].
The results of the CWT are many wavelet coefficients, which are
a function of scale and position
Gabor’s Proposal: Short Time Fourier
Transform
STFT x (t , f )   [ x(t ' ) g (t '  t )]e  j 2ft ' dt '
Requirements:
Signal in time domain: require short time window to
depict features of signal.
Signal in frequency domain: require short frequency
window (long time window) to depict features of signal.
What are wavelets?
Wavelets are functions defined over a finite interval and having
an average value of zero.
Haar wavelet
What is wavelet transform?
 The wavelet transform is a tool for carving up functions,
operators, or data into components of different frequency,
allowing one to study each component separately.
The basic idea of the wavelet transform is to represent any
arbitrary function ƒ(t) as a superposition of a set of such
wavelets or basis functions.
These basis functions or baby wavelets are obtained from a
single prototype wavelet called the mother wavelet, by
dilations or contractions (scaling) and translations (shifts).
The continuous wavelet transform (CWT)
 Fourier Transform
F ( ) 


f (t )e  jt dt

FT is the sum over all the time of signal f(t) multiplied by a
complex exponential.
Similarly, the Continuous Wavelet Transform (CWT) is
defined as the sum over all time of the signal multiplied by
scale , shifted version of the wavelet function s, (t ) :
 ( s, )   f (t )s*, (t )dt
where * denotes complex conjugation. This equation shows how
a function ƒ(t) is decomposed into a set of basis functions s, (t ) ,
called the wavelets. Z=r+iy, z*=r-iy
The variables s and  are the new dimensions, scale and
translation (position), after the wavelet transform.
 The wavelets are generated from a single basic wavelet  (t ) ,
the so-called mother wavelet, by scaling and translation:
s , (t ) 
1
s
 t  

s



s is the scale factor,  is the translation factor and the factor
s-1/2 is for energy normalization across the different scales.
 It is important to note that in the above transforms the
wavelet basis functions are not specified.
 This is a difference between the wavelet transform and the
Fourier transform, or other transforms.
Scale
 Scaling a wavelet simply means stretching (or compressing) it.
Scale and Frequency
 Low scale a
Compressed wavelet
High frequency
 High scale a
low frequency

stretched wavelet

Rapidly changing details
slowly changing details
Translation (shift)
 Translating a wavelet simply means delaying (or hastening) its onset.
Haar wavelet
Discrete Wavelets
 Discrete wavelet is written as
 t  k 0 s 0j
1
 j ,k (t ) 
 
j
j
s
s0 
0



j and k are integers and s0 > 1 is a fixed dilation step. The
translation factor 0 depends on the dilation step. The effect of
discretizing the wavelet is that the time-scale space is now
sampled at discrete intervals. We usually choose s0 = 2

j ,k
(t )
*
m,n
1
(t )dt  
0
If j=m and k=n
others
A band-pass filter
 The wavelet has a band-pass like spectrum
From Fourier theory we know that compression in time is
equivalent to stretching the spectrum and shifting it upwards:
1  
F  f (at)  F  
a a
Suppose a=2
This means that a time compression of the wavelet by a factor
of 2 will stretch the frequency spectrum of the wavelet by a
factor of 2 and also shift all frequency components up by a
factor of 2.
Subband coding
 If we regard the wavelet transform as a filter bank, then we
can consider wavelet transforming a signal as passing the
signal through this filter bank.
 The outputs of the different filter stages are the wavelet-
and scaling function transform coefficients.
 In general we will refer to this kind of analysis as a
multiresolution.
 That is called subband coding.
 Splitting the signal spectrum with an iterated filter bank.
8B
LP
4B
f
HP
4B
f
LP
HP
4B
f
2B
LP
B
2B
HP
2B
4B
f
B
 Summarizing, if we implement the wavelet transform as an
iterated filter bank, we do not have to specify the wavelets
explicitly! This is a remarkable result.
The Discrete Wavelet Transform
 Calculating wavelet coefficients at every possible scale
is a fair amount of work, and it generates an awful lot of
data. What if we choose only a subset of scales and
positions at which to make our calculations?
 It turns out, rather remarkably, that if we choose scales
and positions based on powers of two -- so-called dyadic
scales and positions -- then our analysis will be much
more efficient and just as accurate. We obtain just such an
analysis from the discrete wavelet transform (DWT).
Approximations and Details
 The approximations are the high-scale, low-frequency
components of the signal. The details are the low-scale,
high-frequency components. The filtering process, at its
most basic level, looks like this:
 The original signal, S, passes through two complementary filters
and emerges as two signals .
Downsampling
 Unfortunately, if we actually perform this operation on a
real digital signal, we wind up with twice as much data as
we started with. Suppose, for instance, that the original
signal S consists of 1000 samples of data. Then the
approximation and the detail will each have 1000 samples,
for a total of 2000.
 To correct this problem, we introduce the notion of
downsampling. This simply means throwing away every
second data point.
An example:
Reconstructing Approximation and Details
Upsampling
Wavelet Decomposition
Multiple-Level Decomposition
The decomposition process can be iterated, with successive
approximations being decomposed in turn, so that one signal
is broken down into many lower-resolution components. This
is called the wavelet decomposition tree.
 Scaling function (two-scale relation)
 (2 j t )   h j 1 (k ) (2 j 1 t  k )
k
Wavelet
 (2 j t )   g j 1 (k ) (2 j 1 t  k )
k
The signal f(t) can be expresses as
f (t )    j 1 (k ) (2 j 1 t  k )   j 1 (k ) (2 j 1 t  k )
k
k
 j 1 (k )   h(m  2k ) j (m)
m
 j 1 (k )   g (m  2k ) j (m)
m
DWT
 j 1 (k )   h(m  2k ) j (m)
m
 j 1 (k )   g (m  2k ) j (m)
m
 j 1 (k )   h[(2k  m)] j (m)  h(k )   j (k )
m
 j 1 (k )   g[(2k  m)] j (m)  g (k )   j (k )
m
1
5
1 3
 j 2 (0)   h(m) j 1 (m)  h(0) j 1 (0)  h(1) j 1 (1) 



1
2
2
2
2
m 0
1
2
 j 2 (0)   g (m) j 1 (m)  g (0) j 1 (0)  g (1) j 1 (1) 
m 0
1
5
1  3



4
2
2
2
2
Wavelet Reconstruction (Synthesis)
Perfect reconstruction : G  G '  H  H '  1
g ' ( n)  ( 1) n 1 h( n)
h ' ( n)  ( 1) n 1 g ( n)
(4,0)
y1

 j 1  5 / 2 ,3 / 2
(1,0)
x1 (k )   h ' (k  m) j  2 (m)
x1
 j 1   3 / 2 ,3 / 2
 j 2 (0)  1  j 2 (0)  4
m
x1 (0)   h ' (m) j  2 (m)  h ' (0) j 2 (0)  1 / 2
m
x1 (1)   h ' (1  m) j  2 (m)  h ' (1) j  2 (0)  h ' (0) j 2 (1)  1 / 2
m
x1 (2)  0
y1 (k )   g ' (k  m) j 2 (m)
m
y1 (0)   g ' (m) j 2 (m)  g ' (0) j 2 (0)  4 / 2
m
y1 (1)   g ' (1  m) j 2 (m)  g ' (1) j 2 (0)  g ' (0) j 2 (1)  4 / 2
m
y1 (2)  0



 2-D Discrete Wavelet Transform
 A 2-D DWT can be done as follows:
Step 1: Replace each row with its 1-D DWT;
Step 2: Replace each column with its 1-D DWT;
Step 3: repeat steps (1) and (2) on the lowest subband for the next scale
Step 4: repeat steps (3) until as many scales as desired have been
completed
L
original
H
LL
HL
LH
HH
One scale
HL
LH
HH
two scales
Image at different scales
Correlation between features at
different scales
Wavelet construction – a simplified
approach
• Traditional approaches to wavelets have used a
filterbank interpretation
• Fourier techniques required to get synthesis
(reconstruction) filters from analysis filters
• Not easy to generalize
Wavelet construction – lifting
3 steps
• Split
• Predict (P step)
• Update (U step)
Example – the Haar wavelet
• S step
Splits the signal into odd and even samples
sn
en 1 even samples
on 1
odd samples
Example – the Haar wavelet
• P step
Predict the odd samples from the even samples
sn ,l
l
For the Haar wavelet, the prediction for the
odd sample is the previous even sample :
sˆn, 2l 1  sn, 2l
Example – the Haar wavelet
Detail signal : d n1,l  sn,2l 1  sn,2l
sn ,l
d n1,l
l
l
Example – the Haar wavelet
• U step
Update the even samples to produce the next coarser
scale approximation
sn1,l  sn,2l  d n1,l / 2
The signal average is maintained :
2 n1 1
s
l 0
2 n 1
n 1,l
 1 / 2  sn ,l
l 0
Summary of the Haar wavelet
decomposition
d n1,l  sn, 2l 1  sn, 2l
sn1,l  sn,2l  d n1,l / 2
Can be computed ‘in place’ :
….. sn, 2l 1
sn , 2 l
-1
sn, 2l 1 …..
-1
dn1,l 1
sn , 2 l
1/2
P step
d n1,l
1/2
dn1,l 1
sn1,l
d n1,l
U step
Inverse Haar wavelet transform
• Simply run the forward Haar wavelet transform
backwards!
sn, 2l  sn1,l  dn1,l / 2
sn, 2l 1  sn, 2l  dn1,l
Then merge even and odd samples
sn , 2 l
sn, 2l 1
Merge
sn ,l
General lifting stage of wavelet
decomposition
e j 1
+
sj
P
Split
-
o j 1
s j 1
U
d j 1
Multi-level wavelet decomposition
• We can produce a multi-level decomposition by
cascading lifting stages
sn
lift
sn 1
…
lift
d n1
d n 2
lift
s0
d0
General lifting stage of inverse
wavelet synthesis
s j 1
e j 1
-
U
d j 1
P
Merge
+
o j 1
sj
Multi-level inverse wavelet synthesis
• We can produce a multi-level inverse wavelet
synthesis by cascading lifting stages
s0
lift
s1
lift
s2
…...
sn 1
d0
d1
d2
d n1
lift
sn
Advantages of the lifting implementation
• Inverse transform
 Inverse transform is trivial – just run the code backwards
 No need for Fourier techniques
 Generality
 The design of the transform is performed without reference
to particular forms for the predict and update operators
 Can even include non-linearities (for integer wavelets)
Example 2 – the linear spline wavelet
• A more sophisticated wavelet – uses slightly more
complex P and U operators
• Uses linear prediction to determine odd samples
from even samples
The linear spline wavelet
• P-step – linear prediction
Original signal
Linear prediction at
odd samples
Detail signal (prediction error
at odd samples)
The linear spline wavelet
• The prediction for the odd samples is based on the
two even samples either side :
sˆn,2l 1  1/ 2sn, 2l  sn, 2l 2 
dn1,l  sn,2l 1 1/ 2sn,2l  sn,2l 2 
The linear spline wavelet
• The U step – use current and previous detail
signal sample
sn1,l  sn, 2l  1/ 4(dn1,l 1  dn1,l )
The linear spline wavelet
• Preserves signal average and first-order moment
(signal position) :
2n1 1
s
l 0
2 n 1
n 1,l
2 n 1 1
 ls
l 0
 1 / 2  sn ,l
l 0
2 n 1
n 1,l
 1 / 2  ls n,l
l 0
The linear spline wavelet
• Can still implement ‘in place’
s n , 2l
-1/2
sn, 2l 1
-1/2
s n , 2l
1/4
-1/2
d n1,l
1/4
sn1,l
sn,2l 2
P step
1/4
U step
sn,2l 2
1/4
d n1,l
-1/2
sn1,l 1
Summary of linear spline wavelet
decomposition
dn1,l  sn,2l 1 1/ 2sn,2l  sn,2l 2 
sn1,l  sn, 2l  1/ 4(dn1,l 1  dn1,l )
Computing the inverse is trivial :
sn, 2l  sn1,l 1/ 4(dn1,l 1  dn1,l )
sn, 2l 1  dn1,l  1/ 2sn,2l  sn,2l 2 
The even and odd samples are then merged as before
Wavelet decomposition applied to a 2D
image
approx
lift
.
.
.
.
lift.
detail
approx
detail
Wavelet decomposition applied to a 2D
image
approx
lift
detail
in1,n1
lift
approx
lift
detail
dn11,n1
approx
lift
detail
detail
dn21,n1
approx
dn31,n1
Why is wavelet-based compression
effective?
• Allows for intra-scale prediction (like many other
compression methods) – equivalently the wavelet
transform is a decorrelating transform just like the
DCT as used by JPEG
• Allows for inter-scale (coarse-fine scale)
prediction
Why is wavelet-based compression
effective?
Original
1 level Haar
1 level linear spline
2 level Haar
Why is wavelet-based compression
effective?
• Wavelet coefficient histogram
25000
20000
15000
Original
Haar wavelet
10000
5000
0
-255
-205
-155
-105
-55
-5
45
95
145
195
245
Why is wavelet-based compression
effective?
• Coefficient entropies
Entropy
Original image
7.22
1-level Haar wavelet
5.96
1-level linear spline wavelet
5.53
2-level Haar wavelet
5.02
2-level linear spline wavelet
4.57
Why is wavelet-based compression
effective?
• Wavelet coefficient dependencies
P(X )
X
Why is wavelet-based compression
effective?
• Lets define sets S (small) and L (large)
wavelet coefficients
• The following two probabilities describe
interscale dependancies
P( X  S | P( X )  S )
P( X  L | P( X )  L)
Why is wavelet-based compression
effective?
• Without interscale dependancies
#S
P( X  S | P( X )  S )  2
N
#L
P ( X  L | P ( X )  L)  2
N
Why is wavelet-based compression
effective?
• Measured dependancies from Lena
P( X  S | P( X )  S )
0.886
P( X  L | P( X )  L)
0.529
#S
N2
#L
N2
0.781
0.219
Why is wavelet-based compression
effective?
• Intra-scale dependencies
X1
X
X8
1 8
c   c( X n )
8 n1
X n  S
X n  L
if c  T
if c  T
Why is wavelet-based compression
effective?
• Measured dependancies from Lena
P( X  S | X n  S ) P( X  L | X n  L)
0.912
0.623
#S
N2
0.781
#L
N2
0.219
Why is wavelet-based compression
effective?
• Have to use a causal neighbourhood for spatial
prediction
Example image compression algorithms
• We will look at 3 state of the art algorithms
– Set partitioning in hierarchical sets (SPIHT)
– Significance linked connected components analysis
(SLCCA)
– Embedded block coding with optimal truncation
(EBCOT) which is the basis of JPEG2000
The SPIHT algorithm
• Coefficients transmitted in partial order
Coeff. number
1
2
3
4
5
6
7
8
9
10
11 12 13 14…….
msb 5
1
1
0
0
0
0
0
0
0
0
0
0
0
0
…
4
x
x
1
1
0
0
0
0
0
0
0
0
0
0
…
3
x
x
x
x
1
1
1
0
0
0
0
0
0
0
…
2
x
x
x
x
x
x
x
1
1
1
1
1
1
0
…
1
x
x
x
x
x
x
x
x
x
x
x
x
x
1
…
lsb 0
x
x
x
x
x
x
x
x
x
x
x
x
x
x
…
The SPIHT algorithm
• 2 components to the algorithm
– Sorting pass
– Sorting information is transmitted on the basis of the
most significant bit-plane
• Refinement pass
– Bits in bit-planes lower than the most significant bit
plane are transmitted
The SPIHT algorithm
N= msb of (max(abs(wavelet coefficient)))
for (bit-plane-counter)=N downto 1
transmit significance/insignificance wrt bit-plane
counter
transmit refinement bits of all coefficients that
are already significant
The SPIHT algorithm
• Insignificant coefficients (with respect to
current bitplane counter) organised into
zerotrees
The SPIHT algorithm
• Groups of coefficients made into zerotrees by
set paritioning
The SPIHT algorithm
• SPIHT produces an embedded bitstream
bitstream
….1100101011100101100011………01011100010111011011101101….
The SLCCA algorithm
Wavelet
transform
Quantise
coefficients
Cluster and
transmit
significance
map
Bit-plane
encode
significant
coefficients
The SLCCA algorithm
• The significance map is grouped into clusters
The SLCCA algorithm
• Clusters grown out from a seed
Seed
Significant coeff
Insignificant coeff
The SLCCA algorithm
• Significance link symbol
Significance
link
Image compression results
• Evaluation
– Mean squared error
– Human visual-based metrics
– Subjective evaluation
Image compression results
• Mean-squared error
1
mse  2
N
 (I (r, c)  Iˆ(r, c)
2
N 1
r ,c  0
Usually expressed as peak-signal-to-noise (in dB)
2
255
PSNR(dB)  10log10
m se
Image compression results
43
PSNR(dB)
41
39
37
35
SPIHT
33
SLCCA
31
JPEG
29
27
25
0.2
0.4
0.6
0.8
1
1.2
bit-rate (bits/pixel)
Image compression results
43
PSNR(dB)
41
39
37
35
Haar
33
Linear spline
31
Daubechies 9-7
29
27
25
0.2
0.4
0.6
0.8
1
1.2
bit-rate (bits/pixel)
Image compression results
SPIHT 0.2 bits/pixel
JPEG 0.2 bits/pixel
Image compression results
SPIHT
JPEG
EBCOT, JPEG2000
• JPEG2000, based on embedded block coding and
optimal truncation is the state-of-the-art
compression standard
• Wavelet-based
• It addresses the key issue of scalability
– SPIHT is distortion scalable as we have already
seen
– JPEG2000 introduces both resolution and
spatial scalability also
• An excellent reference to JPEG2000 and
compression in general is “JPEG2000” by
D.Taubman and M. Marcellin
EBCOT, JPEG2000
• Resolution scalability is the ability to extract from
the bitstream the sub-bands representing any
resolution level
bitstream
….1100101011100101100011………01011100010111011011101101….
EBCOT, JPEG2000
• Spatial scalability is the ability to extract from the
bitstream the sub-bands representing specific
regions in the image
– Very useful if we want to selectively decompress
certain regions of massive images
bitstream
….1100101011100101100011………01011100010111011011101101….
Introduction to EBCOT
• JPEG2000 is able to implement this general
scalability by implementing the EBCOT paradigm
• In EBCOT, the unit of compression is the
codeblock which is a partition of a wavelet subband
• Typically, following the wavelet transform,each
sub-band is partitioned into small blocks (typically
32x32)
Introduction to EBCOT
• Codeblocks – partitions of wavelet subbands
codeblock
Introduction to EBCOT
• A simple bit stream organisation could
comprise concatenated code block bit streams
L0
CB0
L1
Length of next
code-block stream
CB1
L2 CB2
……
Introduction to EBCOT
• This simple bit stream structure is resolution and
spatially scalable but not distortion scalable
• Complete scalability is obtained by introducing
quality layers
– Each code block bitstream is individually (optimally)
truncated in each quality layer
– Loss of parent-child redundancy more than
compensated by ability to individually optimise
separate code block bitstreams
Introduction to EBCOT
• Each code block bit stream partitioned into
a set of quality layers
LQ0
L00
CB00
L10
Q0
CB10
L10
LQ1
L02 CB20
CB01
L11
Q1
…
…
CB11
L12 CB21
…
EBCOT advantages
• Multiple scalability
– Distortion, spatial and resolution scalability
• Efficient compression
– This results from independent optimal truncation of
each code block bit stream
• Local processing
– Independent processing of each code block allows for
efficient parallel implementations as well as hardware
implementations
EBCOT advantages
• Error resilience
– Again this results from independent code block
processing which limits the influence of errors
Performance comparison
• A performance comparison with other waveletbased coders is not straightforward as it would
depend on the target bit rates which the bit streams
were truncated for
– With SPIHT, we simply truncate the bit stream when
the target bit rate has been reached
– However, we only have distortion scalability with
SPIHT
• Even so, we still get favourable PSNR (dB) results
when comparing EBCOT (JPEG200) with SPIHT
Performance comparison
• We can understand this more fully by
looking at graphs of distortion (D) against
rate (R) (bitstream length)
D
Truncation
points
R-D curve for
continuously modulated
quantisation step size
R
Performance comparison
• Truncating the bit stream to some arbitrary
rate will yield sub-optimal performance
D
R
Performance comparison
43
PSNR(dB)
41
39
37
35
Spiht
33
31
EBCOT/JPEG2000
29
27
25
0.0625
0.125
0.25
0.5
1
bit-rate (bits/pixel)
Performance comparison
• Comparable PSNR (dB) results between
EBCOT and SPIHT even though:
– Results for EBCOT are for 5 quality layers (5
optimal bit rates)
– Intermediate bit rates sub-optimal
• We have resolution, spatial, distortion
scalability in EBCOT but only distortion
scalability in SPIHT