• Media Coding and Content Processing:
Chapter 3 (Sections 3.1 … 3.4)
• Multimedia Signals and Systems (Chapter 2)
• Fundamentals of Multimedia (Chapter 6:
• Self Reading Material of Section 2.2 of
Multimedia Signals and Systems (Human
Auditory System) will be on Quiz 2.
• Audio is a wave resulting from air pressure
disturbance that reaches our eardrum generating
the sound we hear.
– Humans can hear frequencies in the range 20-20,000
• ‘Acoustics’ is the branch of physics that studies
Characteristics of Audio
• Audio has normal wave properties
• A sound wave has several different
– Amplitude (loudness/intensity)
– Frequency (pitch)
– Envelope (waveform)
• Audio amplitude is often expressed in decibels
• Sound pressure levels (loudness or volume) are
measured in a logarithmic scale (deciBel, dB) used
to describe a ratio
– Suppose we have two loudspeakers, the first playing a
sound with power P1, and another playing a louder
version of the same sound with power P2, but
everything else (how far away, frequency) is kept the
– The difference in decibels between the two is defined
10 log10 (P2/P1) dB
• In microphones, audio is captured as analog signals
(continuous amplitude and time) that respond
proportionally to the sound pressure, p.
• The power in a sound wave, all else equal, goes as the
square of the pressure.
– Expressed in dynes/cm2.
• The difference in sound pressure level between two
sounds with p1 and p2 is therefore 20 log10 (p2/p1) dB
• The “acoustic amplitude” of sound is measured in
reference to p1 = pref = 0.0002 dynes/cm2.
– The human ear is insensitive to sound pressure levels below
60 - 70 dB
120 - 130 dB
Threshold of hearing
Rustling of paper
Recording studio (ambient level)
Resident (ambient level)
Office (ambient level)
Heavy road traffic
Home audio listening level
Threshold of pain
Rock singer screaming into microphone
• Audio frequency is the number of high-to-low pressure
cycles that occurs per second.
– In music, frequency is referred to as pitch.
• Different living organisms have different abilities to
hear high frequency sounds
Dogs: up to 50KHz
Cats: up to 60 KHz
Bats: up to 120 KHz
Dolphins: up to 160KHz
• Called the audible band.
• The exact audible band differs from one to another and deteriorates
• The frequency range of sounds can be divided into
Audible sound 20 Hz
0 Hz – 20 Hz
– 20 KHz
20 KHz – 1 GHz
1 GHz – 10 GHz
• Sound waves propagate at a speed of around 344
m/s in humid air at room temperature (20 C)
– Hence, audio wave lengths typically vary from 17 m
(corresponding to 20Hz) to 1.7 cm (corresponding to
• Sound can be divided into periodic (e.g. whistling
wind, bird songs, sound from music) and
nonperiodic (e.g. speech, sneezes and rushing
• Most sounds are combinations of different
frequencies and wave shapes. Hence, the
spectrum of a typical audio signal contains
one or more fundamental frequency, their
harmonics, and possibly a few crossmodulation products.
– Fundamental frequency
• The harmonics and their amplitude determine the
tone quality or timbre.
• When sound is generated, it does not last
forever. The rise and fall of the intensity of
the sound is known as the envelope.
• A typical envelope consists of four sections:
attack, decay, sustain and release.
• Attack: The intensity of a note increases from silence to a
• Decay: The intensity decreases to a middle level.
• Sustain: The middle level is sustained for a short period of
• Release: The intensity drops from the sustain level to zero.
• Different instruments have different
– Violin notes have slower attacks but a longer
– Guitar notes have quick attacks and a slower
Audio Signal Representation
• Waveform representation
– Focuses on the exact representation of the
produced audio signal.
• Parametric form representation
– Focuses on the modeling of the signal
– Two major forms
• Music synthesis (MIDI Standard)
• Speech synthesis
Audio Generation and Playback
• To get audio (or video for that matter) into a
computer, we must digitize it (convert it into
a stream of numbers).
• This is achieved through sampling,
quantization, and coding.
• Sampling: The process of converting
continuous time into discrete values.
1. Time axis divided into fixed intervals
2. Reading of the instantaneous value of the
analog signal is taken at the beginning of
each time interval (interval determined by
a clock pulse)
3. Frequency of clock is called sampling rate
or sampling frequency
• The sampled value is held constant for the
next time interval (sampling and hold circuit)
• The process of converting continuous
sample values into discrete values.
– Size of quantization interval is called
– How many values can a 4-bit quantization
represent? 8-bit? 16-bit?
• The higher the quantization, the resulting sound
• The process of representing quantized
Analog to Digital Conversion
• Musical sound can be generated, unlike
other types of sounds.
• Therefore, the Musical Instrument Digital
Interface standard has been developed
– The standard emerged in its final form in
– A music description language in binary form
• A given piece of music is represented by a sequence
of numbers that specify how the musical instruments
are to be played at different time instances.
• An MIDI studio consists of
– Controller: Musical performance device that generates
MIDI signal when played.
• MIDI Signal: A sequence of numbers representing a certain
– Synthesizer: A piano-style keyboard musical instrument
that simulates the sound of real musical instruments
– Sequencer: A device or a computer program that
records a MIDI signal.
– Sound Module: A device that produces pre-recorded
samples when triggered by a MIDI controller or
MIDI File Organization
Actual Music Data
Start/end of a score
• MIDI standard specifies 16 channels
– A MIDI device is mapped onto one channel
• E.g. MIDI Guitar controller, MIDI wind machine,
– 128 instruments are identified by the MIDI
Electric grand piano (2)
Telephone ring (124)
• Can play 1 single score (e.g. flute) vs.
multiple scores (e.g. organ)
• Maximum number of scores that can be
played concurrently is an important
property of a synthesizer
– 3..16 scores per channel.
3D Sound Projection
• After experimentation, it was shown that the
use of a two-channel sound (stereo)
produces the best hearing effects for most
• Direct sound path: The shortest path between the sound
source and the auditor.
– Carries the first sound waves to the auditors head.
• All other sound paths are reflected
• All sound paths leading to the human ear are influenced by
the auditor’s individual head-related transfer function
– HRTF is a function of the path’s direction (horizontal and vertical
angles) to the auditor.
Pulse response in a closed room
• What determines the quality of the
Basic Types of a Digital Signal
• Unit Impulse Function [n]
• Unit Step Function u[n]
• To plot the sinc function in Matlab
x = linspace(-5,5);
y = sinc(x);
Determining the Sampling Rate
• Suppose we are sampling a sine wave. How
often do we need to sample it to figure out
• If the highest frequency contained in an
analog is B and the signal is sampled at a
rate F > 2B, then the signal can be exactly
recovered from its sample values.
• F=2B is called the Nyquist Rate.
• Determines amplitude fidelity of the signal relative to
the original analog signal.
• Quantization error (noise) is the maximum difference
between the quantized sample values and the analog
• The digital signal quality relative to the original signal is
measured by the signal to noise ratio (SNR).
– SNR=20log10(S/N), where S is the maximum signal amplitude
and N is the quantization noise (=quantization step).