Digital Media Basics CMPT 771 Internet Architecture and Protocols
Download
Report
Transcript Digital Media Basics CMPT 771 Internet Architecture and Protocols
CMPT 771 Internet Architecture
and Protocols
Digital Media Basics
CMPT771 Digital Media Basics
1
Media Basics
Contents:
Brief introduction to digital media
Audio/Video
Digitization
Representation
Compression
CMPT771 Digital Media Basics
2
Audio Digitization (PCM)
CMPT771 Digital Media Basics
3
A few words about digital audio
Sampling theory – Nyquist theorem
the discrete time sequence of a sampled continuous
function { V(tn) } contains enough information to reproduce
the function V=V(t) exactly provided that the sampling rate
is at least twice that of the highest frequency contained in
the original signal V(t)
Analog signal sampled at constant rate
telephone: 8,000 samples/sec
CD music: 44,100 samples/sec
CMPT771 Digital Media Basics
4
Audio Digitization (Pulse Code Modulation)
Sound in analogue formats must be digitized
at every time interval the sound is converted to a
digital equivalent
using 2 bits the following sound can be digitized
CMPT771 Digital Media Basics
5
Digitize audio
Each sample quantized,
i.e., rounded
e.g., 28=256 possible
quantized values
Each quantized value
represented by bits
8 bits for 256 values
Example: 8,000
samples/sec, 256
quantized values -->
64,000 bps
Receiver converts it
back to analog signal:
some quality reduction
Example rates
CD: 1.411 Mbps
MP3: 96, 128, 160 kbps
Internet telephony:
5.3 - 13 kbps
CMPT771 Digital Media Basics
6
Approximate file sizes for 1 second of audio
Channels
Resolution
Fs
File Size
Mono
8bit
8Khz
64Kb
Stereo
8bit
8Khz
128Kb
Mono
16bit
8Khz
128Kb
Stereo
16bit
16Khz
256Kb
Stereo
16bit
44.1Khz
1441Kb*
Stereo
24bit
44.1Khz
2116Kb
1CD 700M 70-80 mins
CMPT771 Digital Media Basics
7
Psychoacoustic: Perceptual Coding
Hide errors where humans will not see or hear it
Study hearing and vision system to understand how we
see/hear
Masking refers to one signal overwhelming/hiding another
(e.g., loud siren or bright flash)
Natural Bandlimitng
Audio perception is 20-20 kHz but most sounds in low
frequencies (e.g., 2 kHz to 4 kHz)
Low frequencies may be encoded as single channel
Human ear can tolerate 200ps second delay
CMPT771 Digital Media Basics
8
Psychoacoustic
-Human aural response
]
CMPT771 Digital Media Basics
9
Psychoacoustic Model
Basically: If you can’t hear the sound, don’t encode it
Human frequency response:
Frequency masking: If within a critical band a stronger sound
and weaker sound compete, you can’t hear the weaker sound.
Don’t encode it.
Temporal masking: After a loud sound, there’s a while before
we can hear a soft sound.
Stereo redundancy: At low frequencies, we can’t detect where
the sound is coming from. Encode it mono.
Critical band
Masking threshold
dictate how much quantization noise can be injected
Inaudiable
CMPT771 Digital Media Basics
10
Audio Compression
Makes use of psychoacoustic knowledge to reduce the
amount of information required to achieve the same
perceived quality (lossy compression)
MP3 = MPEG 1/2 layer 3 audio; achieves CD quality in about
192 kbps (a 3.7:1 compression ratio): higher compression
possible
Sony MiniDisc uses Adaptive TRAnsform Coding (ATRAC) to
achieve a 5:1 compression ratio (about 141 kbps)
http://www.mpeg.org
http://www.minidisc.org/aes_atrac.html
CMPT771 Digital Media Basics
11
Transform Coding
Frequency analysis ?
Time domain ? Not easy!
Time domain -> Transform domain
Sequence to be coded is converted into new sequence
using transformation rule.
New sequence - transform coefficients.
Process is reversible - get back to original sequence using
inverse transformation.
Example - the Fourier transform.
Coefficients represent proportion of energy
contributed by different frequencies.
CMPT771 Digital Media Basics
12
Transform Coding (Cont…)
In transform coding - choose transformation such that only
subset of coefficients have significant values.
Energy confined to subset of ‘important’ coefficients.
Known as ‘energy compaction’.
Example - FT of bandlimited signal:
CMPT771 Digital Media Basics
13
Artefacts of compression
Mp3 encoded recordings rarely sound identical to
original uncompressed audio files
Whole areas of the spectrum are lost in the
encoding process
On small domestic ‘hi-fi’ or PC speakers, however,
mp3 compressed audio can be acceptable
CMPT771 Digital Media Basics
14
WAV File (34Mb)
CMPT771 Digital Media Basics
15
Mp3 file (3Mb)
CMPT771 Digital Media Basics
16
Video Digitization and Compression
Video is sequence of images (frames) displayed at
constant frame rate
e.g. 24 images/sec
Digital image is a 2-D array of pixels
Sampling theory
Each pixel represented by bits
R:G:B
Y:U:V
• Y = 0.299R + 0.587G + 0.114B (Luminance or Brightness)
U = B - Y (Chrominance 1, color difference)
V = R - Y (Chrominance 2, color difference)
Redundancy
spatial
Temporal
CMPT771 Digital Media Basics
17
JPEG (Joint Photographic Experts Group)
Transform
Quantize
Encode
JPEG Lossy Sequential Mode
JPEG Compression Ratios:
30:1 to 50:1 compression is possible with small to moderate defects.
100:1 compression is quite feasible for very-low-quality purposes .
CMPT771 Digital Media Basics
18
JPEG Steps
1 Block Preparation: From RGB to YUV (YIQ) planes
2 Transform: Two-dimensional Discrete Cosine
Transform (DCT) on 8x8 blocks.
3 Quantization: Compute Quantized DCT Coefficients
(lossy).
4 Encoding of Quantized Coefficients :
Zigzag Scan
Differential Pulse Code Modulation (DPCM) on DC
component
Run Length Encoding (RLE) on AC Components
Entropy Coding: Huffman or Arithmetic
CMPT771 Digital Media Basics
19
Compression:
Transform
Quantize
Encode
Block
Preparation
Transform
Quantize
JPEG Overview
Decompression:
Reverse the order
Encode
CMPT771 Digital Media Basics
20
JPEG: Block Preparation
RGB Input Data
After Block Preparation
Input image: 640 x 480 RGB (24 bits/pixel) transformed to three planes:
Y: (640 x 480, 8-bit/pixel) Luminance (brightness) plane.
U, V: (320 X 240 8-bits/pixel) Chrominance (color) planes.
CMPT771 Digital Media Basics
21
Discrete Cosine Transform (DCT)
A transformation from spatial domain to frequency domain (similar to FFT)
Definition of 8-point DCT:
F[0,0] is the DC component and other F[u,v] define AC components of DCT
CMPT771 Digital Media Basics
22
The 64 (8 x 8) DCT Basis Functions
u
DC Component
v
CMPT771 Digital Media Basics
23
8x8 DCT Example
or u
DC Component
Original values of an 8x8 block
(in spatial domain)
Corresponding DCT coefficients
(in frequency domain)
CMPT771 Digital Media Basics
24
JPEG: Quantized
DCT Coefficients
q(u,v)
Uniform quantization:
Divide by constant N and round result.
In JPEG, each DCT F[u,v] is divided by
a constant q(u,v).
The table of q(u,v) is called quantization table.
F[u,v]
Rounded
F[u,v]/ q(u,v)
CMPT771 Digital Media Basics
25
JPEG: Zigzag Scan
Maps an 8x8 block into a 1 x 64 vector
Zigzag pattern group low frequency coefficients in top of vector.
CMPT771 Digital Media Basics
26
JPEG: Encoding of Quantized
DCT Coefficients
DC Components:
DC component of a block is large and varied, but often
close to the DC value of the previous block.
Encode the difference of DC component from previous 8x8
blocks using Differential Pulse Code Modulation (DPCM).
AC components:
The 1x64 vector has lots of zeros in it.
Using RLE, encode as (skip, value) pairs, where skip is the
number of zeros and value is the next non-zero component.
Send (0,0) as end-of-block value.
CMPT771 Digital Media Basics
27
Intra-Frame Coding (JPEG)
Block-based 2-D DCT (Discrete Cosine Transform)
• Karhunen-Loeve (KL) transform ?
• 8x8 blocks
Frequency domain compression -> Run-length coding ->
Entropy (Huffman) coding
A typical 8x8 block of quantized DCT coefficients.
Most of the higher order coefficients have been quantized to 0.
12 34
87 0
16 0
0 0
0 0
0 0
0 0
0 0
0 54 0
0 12 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Zig-zag scan: the sequence of DCT coefficients to be transmitted:
12 34 87 16 0 0 54 0 0 0 0 0 0 12 0 0 3 0 0 0 .....
DC coefficient (12) is sent via a separate Huffman table.
Runlength coding remaining coefficients:
34 | 87 | 16 | 0 0 54 | 0 0 0 0 0 0 12 | 0 0 3 | 0 0 0 .....
CMPT771 Digital Media Basics
28
JPEG: Runlength Coding
A typical 8x8 block of quantized DCT coefficients.
Most of the higher order coefficients have been quantized to 0.
12 34
87 0
16 0
0 0
0 0
0 0
0 0
0 0
0 54 0
0 12 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Zig-zag scan: the sequence of DCT coefficients to be transmitted:
12 34 87 16 0 0 54 0 0 0 0 0 0 12 0 0 3 0 0 0 .....
DC coefficient (12) is sent via a separate Huffman table.
Runlength coding remaining coefficients:
34 | 87 | 16 | 0 0 54 | 0 0 0 0 0 0 12 | 0 0 3 | 0 0 0 .....
Further compress: statistical (entropy) coding
CMPT771 Digital Media Basics
29
A few words about Entropy
Entropy
A measure of information content
Entropy of the English Language
How much information does each character
in “typical” English text contain?
From a probability view
If the probability of a binary event is 0.5 (like a
coin), then, on average, you need one bit to
represent the result of this event.
As the probability of a binary event increases
or decreases, the number of bits you need, on
average, to represent the result decreases
The figure is expressing that unless
an event is totally random, you can
convey the information of the event in
fewer bits, on average, than it might
first appear CMPT771 Digital Media Basics 30
Entropy (Shannon 1948)
For a set of messages S with probability p(s), s S,
the self information of s is:
1
i ( s) log
log p( s)
p( s)
Measured in bits if the log is base 2.
The lower the probability, the higher the information
Entropy is the weighted average of self information.
1
H ( S ) p( s) log
p( s)
sS
CMPT771 Digital Media Basics
31
Entropy Example
p( S ) {.25,.25,.25,.125,.125}
H ( S ) 3.25 log 4 2.125 log 8 2.25
p( S ) {.5,.125,.125,.125,.125}
H ( S ) .5 log 2 4.125 log 8 2
p( S ) {.75,.0625,.0625,.0625,.0625}
H ( S ) .75 log(4 3) 4.0625 log 16 13
.
CMPT771 Digital Media Basics
32
Entropy Coding
Entropy Coding (Variable-length coding, statistical
coding)
•
•
•
•
Lossless coding
Takes advantage of the probabilistic nature of information
Example: Huffman coding, arithmetic coding
Theorem (Shannon)
(lower bound): For any probability distribution p(S)
with associated uniquely decodable code C,
H ( S ) la (C)
Recall Huffman coding…
CMPT771 Digital Media Basics
33
Quantization Table Used
Compressed Image
Compression Ratio: 7.7
JPEG
Example
Compression Ratio: 12.3
Original
Image
Compression Ratio: 33.9
Compression Ratio: 60.1
Produced using the interactive JPEG Java applet at:
http://www.cs.sfu.ca/undergrad/CourseMaterials/CMPT365/material/misc/interactive_jpeg/Ijpeg.html
CMPT771 Digital Media Basics
34
Inter-Frame Predecition
Intra-coded
I-frame
Predicted
P-frame
CMPT771 Digital Media Basics
35
Motion Estimation and Compesentation
CMPT771 Digital Media Basics
36
Video compression: A big picture
CMPT771 Digital Media Basics
37
Bi-Directional Prediction
Intra-Coded
I-Frame
I B B P B B P B B P B B I
Bi-directional
Predicted
B-Frame
Group of frames (GOF)
CMPT771 Digital Media Basics
38
VBR vs CBR: Rate Control
Variable-Bit-Rate
Fixed quantizer Qp
“Constant” quality
E.g. RMVB
Constant-Bit-Rate
Rate
Controller
Qp
Raw
Video
Encoder
VBR
Smoothing
Buffer
CBR
Adaptive quanitzer
“Constant” rate – easier control
• Difference (compared to target rate
can be 0.5% or less)
• E.g. RM, MPEG-1
Rate-distortion optimization
Recall that transport layer also has
rate control …
CMPT771 Digital Media Basics
39
Standardization Organizations
ITU-T VCEG (Video Coding Experts Group)
standards for advanced moving image
coding methods appropriate for
conversational and non-conversational
audio/visual applications.
ISO/IEC MPEG (Moving Picture Experts
Group)
standards for compression and coding,
decompression, processing, and coded
representation of moving pictures, audio,
and their combination
Relation
ITU-T H.262~ISO/IEC 13818-2(mpeg2)
Generic Coding of Moving Pictures and
Associated Audio.
ITU-T H.263~ISO/IEC 14496-2(mpeg4)
WG - work group
SG – sub group
ISO/IEC JTC 1/SC 29/WG 1
Coding of Still Pictures
ISO/IEC JTC 1/SC 29/WG 11
CMPT771 Digital Media Basics
40
Coding Rate and Standards
Mobile
Videophone
ISDN
videophone over PSTN videophone
8
16
64
384
Video CD
1.5
kbit/s
Very low bitrate
Digital TV
5
HDTV
20
Mbit/s
Low bitrate
MPEG-4 H.263 H.261 MPEG-1
Medium bitrate
High bitrate
MPEG-2
CMPT771 Digital Media Basics
41
ISO MPEG-1 (Moving Pictures Experts Group).
MPEG-1
Progressively scanned video for
multimedia applications, at a bit
rate 1.5Mb/s access time for
CD-ROM players.
Video format: near VHS quality
CMPT771 Digital Media Basics
42
ISO MPEG-2
MPEG-2
Standard for Digital Television,
DVD
4 to 8 Mb/s / 10 to 15 Mb/s >>
MPEG -1
Supports various modes of
scalability (Spatial, temporal,
SNR)
There are differences in
quantization and better Variable
length codes tables for
progressive video sequences.
CMPT771 Digital Media Basics
43
MPEG-3?
Originally envisioned for high bit rate applications
such as HDTV
Cancelled since the target rate can be handled by
MPEG-2.
J. Liang SFU ENSC861
CMPT771 Digital Media Basics
2016/5/23
44
44
ISO MPEG-4
A much broader standard.
MPEG-4 was aimed primarily
at low bit rate video
communication, but not
limited to
Applications:
1. Digital television
2. Interactive graphics
applications
3. Interactive
multimedia (World
Wide Web)
Two version: Divx 3 and Divx
4 (Internet world)
Important concept
Video object
CMPT771 Digital Media Basics
45
MPEG-4 Structure
Decoder
A/V
object
Decoder
A/V
object
Decoder
MUX
Compositor
Bitstream
A/V
object
Audio/Video scene
CMPT771 Digital Media Basics
46
MPEG-4 Object Video
Instead of ”frames”: Video Object Planes
Shape Adaptive DCT
A video frame
Alpha map
VOP
SA DCT
Background VOP
VOP
CMPT771 Digital Media Basics
47
Example
Object 3
Object 1
Object 4
Object 2
Problems, comments?
CMPT771 Digital Media Basics
48
Another Example
CMPT771 Digital Media Basics
49
Status
Microsoft, RealVideo,
QuickTime, ...
But only recentagular
frame based
MPEG-4 part 2
DivX/Xvid
QuitTime6
H.264 = MPEG-4 part 10
(2003)
iTune video store
YouTube HD video
Bluray (later version)
CMPT771 Digital Media Basics
50
Next Step: H.26L->H.264 - Done
ITU-T Recommendations: Real time video communication
applications.
MPEG Standards : Video storage, broadcast video, video
streaming applications
H.26 L = ITU-T + MPEG = JVT coding
Current project of Joint Video Team formed by ITU-T SG16
Q6 ( VCEG) and the ISO/IEC JTC 1/SC 29 WG 11 ( MPEG )
Basic configuration similar to H.263 and MPEG-4 Part 2
CMPT771 Digital Media Basics
51
Coding Evolution
H.264/AVC
CMPT771 Digital Media Basics
52
H.264 History
1998: Call for proposal for H.26L issued by ITU-T VCEG (Video Coding
Expert Group)
Oct. 1999: First draft design
Dec. 2001: ITU and ISO formed the Joint Video Team (JVT)
Mar. 2003: approved
ITU-T H.264 and ISO/IEC MPEG-4 Part 10 Advanced Video Coding (AVC)
Jul 2004: Fidelity Range Extensions (FRExt)
Current: Scalability Extensions
Default YouTube format
Objectives:
50% bit rate savings compared to MPEG-2
High quality video at both low and high bit rates:
64kbps to 240Mbps
Network-friendly: more error resilient tools
Support both conversational and non-conversational applications:
Conversational: video conference
Non-conversational: storage, broadcast, streaming
CMPT771 Digital Media Basics
2016/5/23
53
H.264 Design
Current Status : Draft design adopted in Aug 1999 and has
evolved into a test model long term (TML ) ref design
Goals
Enhanced Compression performance
Provision of network friendly packet based video
representation addressing the conversational and nonconversational applications
Conceptual Separation between Video Coding Layer ( VCL)
and Network Adaptation Layer ( NAL)
CMPT771 Digital Media Basics
54
Video Coding Layer
Macro-block
Data Partitioning
Control Data
H.264 Design ( Contd. )
Slice/Partition
Network Adaptation Layer
CMPT771 Digital Media Basics
55
H.264 Design ( Contd.)
Video Coding Layer
Core High compression representation
Block based motion compensated transform video coder
New features enabled to achieve significant improvement in
coding efficiency.
Network Adaptation Layer
Provides the ability to customize the format of the VCL data
over a variety of networks
Unique packet based interface
Packetisation and appropriate signaling is a part of NAL
specification
( not necessarily H.264 specification)
CMPT771 Digital Media Basics
56
New Developments for/beyond H.264
H.265/MPEG-H (High Efficiency Video Coding (HEVC)
50% goal (bitrate reduction)
Start from 2010
February 2012: Committee Draft (complete draft of standard)
July 2012: Draft International Standard
January 2013: Final Draft International Standard
July 7: Formal release
Scalable video coding
Multiview video/3D video
4K UHD
CMPT771 Digital Media Basics
57