Auditory Perception: Lecture for IEOR 170 Spring 2006

Download Report

Transcript Auditory Perception: Lecture for IEOR 170 Spring 2006

Auditory Perception
Sound Models
Cecilia R. Aragon
IEOR 170
UC Berkeley
Spring 2006
• “How the Ear Functions,”
• Brian Bailey,
• Dan Russell,
• James Hillenbrand,
• Lawrence Rosenblum, (McGurk effect)
• Andrew Green,
Spring 2006
IEOR 170
How the Ear Functions
Physical Dimensions of Sound
Perceptual Dimensions of Sound
Sound Intensity and the Decibel Scale
Pitch Perception
Loudness Perception
Timbre Perception
Digitization of Sound
Spring 2006
IEOR 170
How the Ear Functions
Spring 2006
IEOR 170
Physical Dimensions of Sound
Spring 2006
IEOR 170
• Periodic disturbances that travel through a
medium (e.g. air or water)
• Transport energy
• “What is a Wave?” Dan Russell,
Spring 2006
IEOR 170
• A longitudinal, mechanical wave
– caused by a vibrating source
• Pack molecules at different densities
– cause small changes in pressure
• Model pressure differences as sine waves
Spring 2006
IEOR 170
Sound Waves
• Pure Tones - simple waves
• Harmonics - complex waves consisting of
combinations of pure tones (Fourier analysis) the quality of tone or its timbre (i.e. the
difference between a given note on a trumpet
and the same note on a violin) is given by the
Spring 2006
IEOR 170
Changes in Air Pressure
Spring 2006
IEOR 170
Process of Hearing (Transduction)
Spring 2006
IEOR 170
Frequency (temporal) Theory
• Periodic stimulation of membrane matches
frequency of sound
– one electrical impulse at every peak
– maps time differences of pulses to pitch
• Firing rate of neurons far below frequencies that
a person can hear
– Volley theory: groups of neurons fire in wellcoordinated sequence
Spring 2006
IEOR 170
Place Theory
• Waves move down basilar membrane
– stimulation increases, peaks, and quickly tapers
– location of peak depends on frequency of the sound, lower
frequencies being further away
Spring 2006
IEOR 170
Physical Dimensions of Sound
• Amplitude
– height of a cycle
– relates to loudness
• Wavelength (w)
– distance between peaks
• Frequency ( λ )
– cycles per second
– relates to pitch
– λ w = velocity
• Most sounds mix many
frequencies & amplitudes
Spring 2006
Sound is repetitive changes
in air pressure over time
IEOR 170
Perceptual Dimensions of Sound
Spring 2006
IEOR 170
Auditory Perception
Auditory perception is a branch of psychophysics.
Psychophysics studies relationships between perception and
physical properties of stimuli.
Physical dimensions: Aspects of a physical stimulus that can
be measured with an instrument (e.g., a light meter, a sound
level meter, a spectrum analyzer, a fundamental frequency
meter, etc.)
Perceptual dimensions: These are the mental experiences
that occur inside the mind of the observer. These experiences
are actively created by the sensory system and brain based on
an analysis of the physical properties of the stimulus. Perceptual
dimensions can be measured, but not with a meter. Measuring
perceptual dimensions requires an observer (e.g., a listener).
Spring 2006
IEOR 170
Visual Psychophysics:
Perceptual Dimensions
Physical Properties
of Light
Auditory Psychophysics:
Perceptual Dimensions
Physical Properties
of Sound
Timbre (sound quality)
Fundamental Frequency
Spectrum Envelope/Amp Env
Spring 2006
IEOR 170
The Three Main Perceptual Attributes of
•Pitch (not fundamental frequency)
•Loudness (not intensity)
•Timbre (not spectrum envelope or amplitude envelope)
The terms pitch, loudness, and timbre refer not to the physical
characteristics of sound, but to the mental experiences that
occur in the minds of listeners.
Spring 2006
IEOR 170
Perceptual Dimensions
• Pitch
– higher frequencies perceived as higher pitch
– humans hear sounds in 20 Hz to 20,000 Hz range
• Loudness
– higher amplitude results in louder sounds
– measured in decibels (db), 0 db represents hearing
Spring 2006
IEOR 170
Perceptual Dimensions (cont.)
• Timbre
– complex patterns added to the lowest, or
fundamental, frequency of a sound,
referred to as spectrum envelope
– spectrum envelopes enable us to
distinguish musical instruments
• Multiples of fundamental frequency give
• Multiples of unrelated frequencies give
Spring 2006
IEOR 170
Sound Intensity and the Decibel Scale
Spring 2006
IEOR 170
Sound Intensity
• Intensity (I) of a wave is the rate at which sound energy
flows through a unit area (A) perpendicular to the
direction of travel
1 E P
A t
P measured in watts (W), A measured in m2
• Threshold of hearing I0 is at 10-12 W/m2
• Threshold of pain is at 1 W/m2
Spring 2006
IEOR 170
Decibel Scale
• Describes intensity relative to threshold of
hearing based on multiples of 10
dB  10 log
Spring 2006
IEOR 170
Decibels of Everyday Sounds
Spring 2006
Rustling leaves
Ambient office noise
Auto traffic
Jet motor
Spacecraft launch
IEOR 170
Interpretation of Decibel Scale
• 0 dB = threshold of hearing (TOH)
• 10 dB = 10 times more intense than TOH
• 20 dB = 100 times more intense than TOH
• 30 dB = 1000 times more intense than TOH
• An increase in 10 dB means that the intensity of the
sound increases by a factor of 10
• If a sound is 10x times more intense than another, then it
has a sound level that is 10*x more decibels than the
less intense sound
Spring 2006
IEOR 170
Loudness from Multiple Sources
• Use energy combination equation
L  10log(10  10  ...  10 )
where L1, L2, …, Ln are in dB
Spring 2006
IEOR 170
• Show that the threshold of hearing is at 0 dB
• Show that the threshold of pain is at 120 dB
• Suppose an electric fan produces an intensity of 40 dB. How
many times more intense is the sound of a conversation if it
produces an intensity of 60 dB?
• One guitar produces 45 dB while another produces 50 dB. What
is the dB reading when both are played?
• If you double the physical intensity of a sound, how many more
decibels is the resulting sound?
Spring 2006
IEOR 170
Pitch Perception
Spring 2006
IEOR 170
Pitch and Fundamental Frequency
All else being equal, the higher the F0, the higher the perceived pitch.
Lower F0, lower pitch
Spring 2006
Higher F0, higher pitch
IEOR 170
Pitch Perception
The ear is more sensitive to F0 differences in the low
frequencies than the higher frequencies. This means that:
300 vs. 350
3000 vs. 3050
That is, the difference in perceived pitch (not F0) between
300 and 350 Hz is NOT the same as the difference in pitch
between 3000 and 3050 Hz, even though the physical
differences in F0 are the same.
Spring 2006
IEOR 170
Music Perception
•Tone height: A sound
quality whereby a sound is
heard to be of higher or
lower pitch; monotonically
related to frequency
•Tone chroma: A sound
quality shared by tones
that have the same octave
•Musical helix: Can help
visualize musical pitch
Spring 2006
IEOR 170
Harmonic Frequencies
• Strings or pipes (trombone,
flute organ) all have
resonant frequencies.
• They may vibrate at that
frequency or some multiple
of it
• All instruments and voices
carry some harmonics and
dampen others
1 octave
2 octaves
3 octaves
Spring 2006
Length of string or pipe
IEOR 170
Loudness Perception
Spring 2006
IEOR 170
Loudness and Intensity
All else being equal, the higher the intensity, the greater the loudness.
Higher intensity, higher loudness
Spring 2006
Lower intensity, lower loudness
IEOR 170
The relationship between intensity and loudness
Doubling intensity does not double loudness. In order to double loudness,
intensity must be increased by a factor of 10, or by 10 dB [10 x log10 (10)
= 10 x 1 = 10 dB]. This is called the 10 dB rule.
Two signals differing by 10 dB:
(500 Hz sinusoids)
Note that the more intense sound is NOT 10 times louder even though it
is 10 times more intense.
The 10 dB rule means that a 70 dB signal is twice as loud as a 60 dB
signal, four times as loud as a 50 dB signal, eight times as loud as a 40
dB signal, etc.
A 30 dB hearing loss is considered mild -- just outside the range of
normal hearing. Based on the 10 dB rule, how much is loudness affected
by a 30 dB hearing loss?
(Answer: 1/8th. But note that this does not mean that someone with a 30 dB loss will have 8 times more difficulty with
speech understanding than someone with normal hearing.)
Spring 2006
IEOR 170
Loudness Perception
Loudness is strongly affected by the frequency of
the signal. If intensity is held constant, a midfrequency signal (in the range from ~1000-4000
Hz) will be louder than lower or higher
frequency signals.
125 Hz, 3000 Hz, 8000 Hz
The 3000 Hz signal should appear louder than the
125 or the 8000 signal, despite the fact that their
intensities are equal.
Spring 2006
IEOR 170
Loudness and Pitch
• More sensitive to loudness at mid frequencies
than at other frequencies
– intermediate frequencies at [500hz, 5000hz]
• Perceived loudness of a sound changes based
on the frequency of that sound
– basilar membrane reacts more to intermediate
frequencies than other frequencies
Spring 2006
IEOR 170
Audibility Thresholds
Spring 2006
IEOR 170
Fletcher-Munson Contours
Each contour represents an equal perceived sound
Spring 2006
IEOR 170
Human Auditory Spectrum
• < 20 Hz - infrasound
• > 20 KHz - ultrasound
• human auditory range
decreases with age
• TV 17.7 KHz
horizontal scanning
• “ultrasonic” cleaning
devices, burglar
alarms (20-40 KHz)
• CD 20 KHz cutoff, LP
60-80 KHz
Spring 2006
IEOR 170
Exposure to Loud Noise
Spring 2006
IEOR 170
Timbre Perception
Spring 2006
IEOR 170
Timbre, also known as sound quality or tone color, is oddly defined in
terms of what it is not:
When two sounds are heard that match for pitch, loudness, and duration,
and a difference can still be heard between the two sounds, that
difference is called timbre.
For example: a clarinet, a saxophone, and a piano all play a middle C at
the same loudness and same duration. Each of these instruments has a
unique sound quality. This difference is called timbre, tone color, or
simply sound quality.
There are also many examples of timbre difference in speech. For
example, two vowels (e.g., /å/ and /i/) spoken at the same loudness and
same pitch differ from one another in timbre.
There are two physical correlates of timbre:
spectrum envelope
amplitude envelope
Spring 2006
IEOR 170
Timbre and Spectrum Envelope
Timbre differences between one musical instrument and another are
partly related to differences in spectrum envelope -- differences in the
relative amplitudes of the individual harmonics. In the examples above,
we would expect all of these sounds to have the same pitch because the
harmonic spacing is the same in all cases. The timbre differences that
you would hear are controlled in part by the differences in the shape of
the spectrum envelope.
Spring 2006
IEOR 170
Six Synthesized Sounds Differing in Spectrum Envelope
Note the similarities in pitch (due to constant F0/harmonic spacing) and the
differences in timbre or sound quality.
Spring 2006
IEOR 170
Vowels Also Differ in Spectrum Envelope
Shown here are the smoothed envelopes only (i.e., the harmonic fine structure is not
shown) of 10 American-English vowels.* Note that each vowel has a unique shape to its
spectrum envelope. Perceptually, these sounds differ from one another in timbre.
Purely as a matter of convention, the term timbre is seldom used by phoneticians, although
it applies just as well here as it does in musical acoustics. In phonetics, timbre differences
among vowels are typically referred to as differences in vowel quality or vowel color.
Hillenbrand, J.M, Houde, R.A., Clark, M.J., and Nearey, T.M. Vowel recognition from harmonic spectra. Acoustical Society of America, Berlin, March,
Spring 2006
IEOR 170
Aperiodic sounds can also differ in spectrum envelope, and the perceptual
differences are properly described as timbre differences.
Spring 2006
IEOR 170
Amplitude Envelope
• Timbre also affected by amplitude envelope
• sometimes called the amplitude contour or energy contour of the
sound wave
• the way sounds are turned on and turned off
Leading edge = attack
Trailing edge = decay
The attack especially has a large effect on timbre.
Spring 2006
IEOR 170
Music examples
(timbre differences related to amplitude envelope)
Plucked vs. bowed stringed instruments
The damping pedal on a piano
The difference in sound quality between a hammered string (e.g.,
a piano) and a string that is plucked by a quill (e.g., a
The timbre differences that distinguish one musical instrument from
another appear to be more closely related to differences in amplitude
envelope -- and especially the attack -- than to the shape of the
spectrum envelope (although both play a role). For example, when the
amplitude contour of an oboe tone is imposed on a violin tone, the
resulting tone sounds more like an oboe than a violin.*
G.D. The Audio Dictionary, 1987, Seattle: University of Washington Press.
Spring 2006
IEOR 170
Same melody, same spectrum envelope (if sustained), different
amplitude envelopes (i.e., different attack and decay characteristics).
Note differences in timbre or sound quality as the amplitude envelope
Spring 2006
IEOR 170
Timbre differences related to amplitude envelope also play a role in
speech. Note the differences in the shape of the attack for /b/ vs.
/w/ and /S/ vs. /tS/.
abrupt attack
more gradual attack
abrupt attack
more gradual attack
Spring 2006
IEOR 170
Hearing Lips and Seeing Voices
(The McGurk Effect)
Spring 2006
IEOR 170
Digitization of Sound
[Steinmetz and Nahrstedt]
Spring 2006
IEOR 170
Microphones, video cameras produce analog signals
(continuous-valued voltages)
To get audio or video into a computer, we must digitize it
(convert it into a stream of numbers)
So, we have to understand discrete sampling (both time
and voltage)
Spring 2006
IEOR 170
Discrete Sampling
• Sampling -- divide the horizontal axis (the time
dimension) into discrete pieces. Uniform sampling is
• Quantization -- divide the vertical axis (signal strength)
into pieces. Sometimes, a non-linear function is applied.
 8 bit quantization divides
the vertical axis into 256
levels. 16 bit gives you
65536 levels.
Spring 2006
IEOR 170
Sampling (in time)
• Measure amplitude at regular intervals
• How many times should we sample?
Spring 2006
IEOR 170
Nyquist Theorem
• Suppose we are sampling a sine wave. How often do we need to
sample it to figure out its frequency?
 If we sample at 1 time per cycle, we can think it's a constant.
Spring 2006
IEOR 170
Nyquist Rate
 If we sample at 1.5 times per cycle, we can think it's a
lower frequency sine wave.
 Nyquist rate -- "For lossless digitization, the sampling
rate should be at least twice the maximum frequency
Spring 2006
IEOR 170
Digital Audio
• Standard music CD:
Sampling Rate: 44.1 kHz
16-bit samples
2-channel stereo
Data transfer rate = 21644,100 = 1.4 Mbits/s
1 hour of music = 1.43,600 = 635 MB
Spring 2006
IEOR 170