Why do concert halls sound different – and how can we

Download Report

Transcript Why do concert halls sound different – and how can we

Frequency response
adaptation in binaural
hearing
David Griesinger
Cambridge MA USA
www.DavidGriesinger.com
Introduction
• This paper proposes fundamental questions about the
properties of human hearing (the topic of this
conference)
– How do we localize sounds in the up/down and front/back
planes?
• Are the methods used different for different individuals?
• Can binaural recordings made for one individual be made to work
for another individual without head-tracking?
– Given the extremely non-uniform transfer of sound pressure from
the soundfield to a human eardrum, how can we accurately
perceive a frequency balance as “flat”
– Does a frequency balanced pink noise from a frontal
loudspeaker sound balanced in frequency?
– If not, are commercial recordings, which are equalized using
loudspeakers, actually frequency balanced?
– If not – in what ways are they biased?
Better binaural technology
To answer these questions the author constructed
an accurate physical model of his own hearing, all
the way to the eardrum. The pinna compliance is
modeled by cutting away the inside of the casting.
The eardrum impedance is modeled with a
resistance tube. The ear canal is an accurate
silicon cast all the way to the eardrum.
Tiny probe microphones
were also built with a
very soft tip. This allows
binaural recording of
performances at the
author’s eardrums,
correct headphone
calibration, and
verification of the
accuracy of the dummy
head model.
A perplexing Discrepancy
• Recordings made with this technology provide
excellent localization accuracy.
– But at least initially the timbre of the playback through
carefully calibrated headphones seems incorrect.
– The frequencies around 3kHz seem too strong, and the
bass is usually weaker than my memory of the
performance.
• Checking and re-checking the calibrations has
convinced me the recordings and the playback are
correct.
– It is my memory of the performance that is flawed.
– The most reasonable explanation is that we continuously
adapt to the frequency balance of sounds around us. We
remember the timbre after such adaptation has taken
place.
A simple model of human hearing
Over a long period of time the brain builds spectral maps for the features that
define up/down and back/front information in HRTFs. When a sound is heard
these features are compared to the maps, and a localization is found.
A simple model of human hearing-2
When a match has been found, the perceptible features of the particular HRTF are
removed, again from a fixed spectral map.
But this spectrum is altered by a relatively short time constant adaptive equalizer,
with acts to make all frequency bands equally perceived.
The time constant of this mechanism for the author is about 5 minutes. It may be
shorter for some individuals.
An example
• The author once noticed a gliding whistle while walking
under an overhead ventilator slot that emitted broadband
noise.
– Walking rapidly (~3.5mph) under that noise source produced a
gliding whistle, somewhat like a Doppler shift.
– This is the uncorrected sound of the vertical HRTFs
– In spite of the lack of timbre correction the sound was correctly
localized – even at much higher speeds.
• No timbre shift was perceived when walking slowly under
the slot (<2mph).
– When there is sufficient time our brains correct the timbre – but
this correction takes time – in this case a fraction of a second.
Headphone listening
When we listen to binaural recordings with headphones the whole process
is broken. Headphones match individuals very poorly (as we will see).
None of the spectral features match the fixed HRTF maps. The brain is
confused, and the subject perceives the sound inside the head.
But the adaptive equalizer is still active – and after a time period the
sound is perceived as frequency balanced.
Consequences of adaptation for
sound engineers.
• Tonmeisters talk about “being familiar” with a particular
loudspeaker or studio.
– They claim they can make an accurately balanced recording with
these tools.
• A logical conclusion is that the timbre of loudspeakers or
playback equipment is irrelevant.
– As long as you are “familiar” with it everything is fine.
• But the conclusion is clearly false.
– A recent book by Floyd Toole details the changes in the
frequency content of popular records as fashion in monitor
loudspeakers changed.
– All sound reinforcement engineers are aware of how much
intelligibility can increase when a sound system is equalized.
This typically involves a treble boost above 1000Hz.
– Absolute frequency balance matters.
Upward Masking
Sound enters basilar membrane at the oval
window. High frequencies excite the
membrane near the entrance, passing
through it and exiting through the second
window below.
Low frequencies travel further down the
spiral, until they excite the membrane and
pass through.
Strong low frequencies disturb the high
frequency portion of the membrane, causing
the well know phenomenon of upward
masking.
Upward masking is a purely mechanical effect, and it cannot be compensated by
adaptive equalization. The high frequencies are simply not detected.
Intelligibility is frequently low in acoustic spaces because there is little low frequency
absorption, and the LF acoustic power is boosted.
We adapt to the frequency imbalance, and say the sound is OK – but unintelligible
Upward masking and mixing
• A consequence of upward masking is that elements in a
mix that are audible in one studio or set of loudspeakers
may be masked in another.
• Recordings mixed over headphones can be seriously in
error.
– Most headphones boost the treble, raising the apparent clarity.
– As an engineer I learned early to mistrust the balance between
direct and reverberation over headphones
– The best I could do was make the recording much dryer than I,
or my clients, preferred and hope for the best.
• One an always make the recording more reverberant
• Making it dryer is much more difficult!
• Can we find a way to correct headphone errors?
Accurate binaural recordings
If safe, comfortable probe microphones are available, it is possible to make
accurate binaural recordings. First we measure the headphone response at
the eardrum – response H. We can then record with the same probe
microphones. If we equalize the recording with the inverse of H, H’, the
recording will play back with perfect fidelity.
Playback of binaural over speakers
If we want to play back the binaural recording over speakers, or if we want to
play loudspeaker music over headphones, we need to measure the spectrum of
a carefully equalized loudspeaker at the eardrums of the listener. This is the
spectrum S. We then equalize the binaural recording with S’, and we can play it
over speakers. Equalizing the phones with H’S allows playback of both binaural
and loudspeaker mixed music. H’S is the inverse of the free-field earphone
response
Binaural equalization in practice
• Note the two previous slides made no attempt to
equalize the probe microphone(s).
– With those schemes, the response of the probe cancels in the
final result.
• In practice, the probe response is complicated and
difficult to invert.
– The author carefully measures the impulse response of the
probes with a B&K 4133 as a reference.
– The responses are inverted in the frequency domain with Matlab.
With care minimal pre-echo is produced.
• All measurements with the probes are first convolved
with this inverse function.
– Second order parametric filters are combined to produce the
other equalization filters.
– Parametric filters can be easily inverted, and sound better than
mathematical inverse filters to the author
Probe Equalization
This graph shows the frequency response and time response of the digital
inverse of the two probes as measured against a B&K 4133 microphone.
Matlab is used to construct the precise digital inverse of the probe response,
both in frequency and in time. The resulting probe response is flat from ~25Hz
to 17kHz. In general, I prefer NOT to use a mathematical inverse response, as
these frequently contain audible artifacts. I minimized these artifacts here by
carefully truncating the measured response as a function of frequency.
Adaptive Timbre – how do we perceive
pink noise as “flat”
• Pink noise sounds plausibly pink even on this sound
system.
• Let’s add a single reflection – and listen for a few
minutes without other sounds:
– The result at first sounds colored, with an identifiable pitch
component.
– The pitch component gradually reduces its loudness.
• But now play the unaltered noise again.
– The unaltered noise now has a pitch, complementary to the
pitch from the reflection.
Some demos of eardrum
recordings
• These recordings have been equalized for loudspeaker
reproduction. You may be able to judge clarity and intelligibility over
near-field loudspeakers.
– Accurate headphone reproduction requires headphone equalization
– If probes are available the method described here will work,
– A method which uses equal loudness curves will be described later in
this paper.
• opera balcony 2, seat 11
– Moderate intelligibility, reverberant sound
• opera balcony 3, seat 12
– Poor intelligibility, very reverberant
• opera standing room
– Deep under balcony 2 – good intelligibility
• A concert hall – row 8 (quite close)
– Very good sound. Not so good further back.
The need for eardrum
measurements
• Almost all current binaural research uses HRTF and
headphones with a blocked or partially blocked ear
canal.
– There is an assumption (without proof) that such measurements
accurately reproduce the sound pressure at the eardrum.
• The assumption is blatantly false. To quote Hammershoi
and Moller:
– “The most immediate observation is that the variation [in sound
transmission from the entrance of the ear canal to the eardrum]
from subject to subject is rather high…The presence of individual
differences has the consequence that for a certain frequency the
transmission differs as much as 20dB between subjects.”
– 20dB is a significant difference in response!
• In spite of the data, Hammershoi and Muller recommend
using measurements at the entrance to the ear canal!
– The recommendation can be disproved by a single subject…
HRTFs from blocked ear canals
Here are pictures of a partially blocked canal and a fully blocked canal. The
following data applies to the fully blocked measurements, but the partially
blocked measurements are similar.
Blocked measurements vs eardrum
• To compare the two measurement methods, I equalize the blocked
measurement of a single HRTF to the same HRTF measured at the
eardrum. I chose the HRTF at azimuth 15 degrees left, and 0
degrees elevation.
• The needed equalization requires at least 3 parametric sections.
– Red is the right ear, blue is the left ear
HRTF differences blocked to eardrum
Twenty different HRTFs were measured with a blocked canal, equalized by
the above EQ, and the difference between them and the open ear canal are
plotted. This data supports Hammershoi and Muller’s contention that that the
directional properties of the measured HRTFs are preserved by the blocked
measurement, at least to a frequency of ~7kHz.
Note the vertical scale is +-30dB. The errors at 7-10k are significant.
Headphone response differences
Using the same method, I measured three headphones. Blue is the AKG 701,
red is the AKG 240, and Cyan is the Sennheiser 250 The curves plot the
difference between the blocked and unblocked measurement, with the
measured HRTF at azimuth 15, elevation 0 as a reference. The vertical scale
is +-30dB. Errors of at least 10dB exist at midband.
More headphones
Blue – and old but excellent noise protection earphone by Sharp. Red
– Ipod earbuds. The error in the blocked measurements are large
enough to prevent accurate localization of binaural recordings.
Analysis
•
•
•
•
The previous curves are NOT the frequency response of the headphones under test.
They show the ERRORs that occur when a blocked ear canal measurement is used
instead of the eardrum pressure.
Because the scale of the plots is +-30dB the difference curves look better than they
really are. Errors of 10dB in frequency ranges vital for timbre are present for almost
all the examples shown.
We can conclude that it is possible to use recordings from dummy heads that lack
accurate ear canals IF AND ONLY IF it is possible to equalize them, either by
comparison to a reference with ear canals, or by equalizing them to sort-of flat for a
frontal sound source. If this is done, we must also equalize the headphones at the
eardrum for the same source.
We can with more assurance conclude that it is NOT possible to equalize
headphones with a measurement system that does NOT include an accurate ear
canal model.
–
•
Both KEMAR and HATS do not qualify.
Measurement systems with true ear canals are a very good thing
–
–
–
In addition I have found that for many earphones it is vital to have a pinna model with
identical compliance to a human ear.
Particularly on-ear headphones alter the concha volume – and drastic changes in the
frequency response can result if the compliance is not accurate.
Pinna are complex structures with variable compliance – so this is tricky!
Headphone calibration through
equal loudness contours
• There is a non-invasive method of headphone calibration
to an individual.
• IEC publication 268-7 and German Standard DIN 45-619
recommend loudness comparison using 1/3 octave noise
instead of physical measurement for headphones.
• These recommendations were superseded by diffuse
field measurements as suggested by Theile.
• Should these methods be revived? – I believe the
answer is yes.
Equal Loudness
Top – ISO equal loudness curves for
80dB and 60dB SPL these are the
average from many individuals, so
features in them are broadened.
Bottom – (blue/red) averaged frontal
response over a +-5 degree cone in
front of the author, measured at the
eardrums. The loudspeaker was
equalized to 200Hz.
Bottom - black/cyan – the same
measurement for the author’s dummy
head with no equalization. The
difference in eardrum impedance
above 8kHz boosts the response of
the dummy – but this can be removed
by equalization.
Equal Loudness 2
• We can measure equal loudness curves because the ear
does not adapt when the stimulus is narrow band –
either noise or tone.
• The differences between the top and bottom curves in
the previous slide can be attributed to the properties of
the middle ear and the inner ear.
• Thus equal loudness curves are a method of measuring
the effective frequency response an individual’s hearing
system in the absence of short-term adaptation to the
environment.
• They represent our sensitivity to timbre in a quiet
environment, or before adaptation takes place.
• Their extreme lack of flatness is proof of the existence,
and effectiveness, of adaptation.
Loudness matching experiments
• The author wrote a Windows program that presents a subject with
alternating bands of 1/3 octave noise, one at 500Hz, and the other
at a test frequency
– The subject matches the loudness of the two bands by adjusting the test
band up and down.
– In use, the equal loudness curves from 500Hz to 12kHz for a carefully
equalized frontal loudspeaker are obtained for this subject.
– The subject then repeats the experiment with a pair of headphones over
a frequency range of 30Hz to 12kHz.
• In this case the balance between the two ears is also tested and corrected.
– The difference of the loudspeaker and headphone measurements
becomes the ideal headphone correction for this individual.
• This program can be used to test the variation in response of a
particular headphone over a wide range of individuals.
– Subjects report that the resulting equalization is very pleasant, and
binaural recordings made with the author’s ears reproduce well without
head tracking.
– Music recorded for loudspeakers is judged identical in timbre in both the
headphones and the loudspeaker.
– The equalization is also identical in timbre to a large high-quality stereo
sound system.
Results for ~10 individuals
About 10 students from
Helsinki University
participated in the test.
The top left graph shows
the equal loudness
contours from the
loudspeaker for each
subject.
The other curves show the
difference between this
curve and the equal
loudness curves for four
different headphones.
It was hoped that the Stax
303 phones would show
less individual variation.
This was not the case.
The Philips phones were an insert type. These
also showed large variation among individuals.
(blue = left ear, red = right
cyan = author’s left ear)
The dip at 3kHz for all subjects
• All subjects show a dip in the loudspeaker equal
loudness curve at 3kHz.
• This corresponds to a universal peak in the
response of the concha and ear canal at this
frequency.
• It is this ear sensitivity peak that causes the
most trouble with our memory of timbre.
• When we first play an accurately calibrated
binaural recording – particularly of a speaking
voice or a chorus – this peak in the loudness is
highly noticeable and unpleasant.
– Once we adapt, everything is OK again.
Comments on these results.
• The experiment is equivalent to equalizing headphones for a frontal,
free-field response.
– This is at variance with the current standard for “diffuse field”
equalization.
– In the author’s experience the free field equalization is far more useful
than the diffuse field equalization, and gives better results on
loudspeaker recorded music.
– These recordings are intended to be heard in a room where the direct
sound is frontal, and dominant.
• After doing the experiment the subjects were given the opportunity
to listen to music both with the frontal equalization and with their
own equal loudness equalization. (the speaker curves were not
subtracted)
– The author’s binaural recordings were perceived with better localization
with the free-field equalization. (These recordings were equalized for
free-field reproduction.)
– Many subjects preferred their own equal loudness equalization for other
material.
• This equalization requires no adaptation to a recording that has an
accurately flat frequency response.
• The sound can be quite seductive.
Some Speculation
• Equal loudness curves have two prominent features; the increase
insensitivity around 3kHz, and the decrease in sensitivity at low
frequencies.
• Music that has been recorded with frequency linear microphones
and not post-processed often seems lacking in bass and harsh in
the midrange – both on loudspeakers and on eardrum-equalized
headphones.
• The author speculates that an unconscious collusion between
loudspeaker designers and recording engineers routinely boosts the
bass, and tweaks the 3kHz region on commonly available
recordings.
– It is common to boost the bass 10dB at 60Hz in automobiles.
• Floyd Toole’s findings that the loudspeakers that are closest to
frequency linear are preferred in blind listening tests may be biased
by the choice of recordings used in the tests.
– The spectrum of choral music in the author’s unprocessed recordings
shows a ~3dB peak around 3kHz.
– This peak is generally absent in vocalists on pop music. Perhaps they
use a different singing technique – and perhaps the equalization has
been adjusted closer to an equal-loudness curve.
Conclusions
• Experiments and observation suggest that human hearing uses a
combination of fixed spectral maps to perceive the localization of a
sound, and then corrects the HRTF timbre with a similar map.
• These fixed maps are combined with a relatively rapid AGC system
that tends to equalize loudness across frequency bands.
• The existence of equal loudness curves show that for narrow band
signals adaptation does not take place. When a new, unknown
broadband signal is first heard, the ear hears the timbre that reflects
the equal loudness calibration. But this timbre is replaced in a short
time with a more balanced timbre, and this balanced timbre is
remembered.
• It is likely that given the opportunity to equalize a recording to their
own taste using loudspeakers with a flat frequency response,
recording engineers will be sorely tempted to move toward their own
equal loudness curve.
– The temptation is dangerous – but probably harmless. We can see that
individual loudness curves can be rather different – particularly at low
frequencies.
– But adaptation will continue to work when the recording is played back,
and if the response does not match that of the listener, they will soon
not notice the difference.