The Physics and Psycho-Acoustics of Surround Recording

Download Report

Transcript The Physics and Psycho-Acoustics of Surround Recording

The Physics and Psycho-Acoustics
of Surround Recording Part 2
David Griesinger
Lexicon
[email protected]
www.world.std.com/~griesngr
Introduction
• We all know how to make a good recording
– We need good music
– A very good performance
– And satisfactory balance between the solos and the instruments.
• But we want to make a great recording
– How do we do it?
– How do we know when a recording is great?
• We must learn how to hear the technical quality of a great
recording,
– And learn how to achieve the best result.
• The talk is based on classical music – but the techniques and
perceptions apply to all recordings.
The recording space is very important!
• It is much easier to achieve a great result in a large hall.
– But large halls with great acoustics are rare.
– Our job is to make a great result in the hall we have available (usually
small).
• This talk will tell you how to do it.
– And help you hear the difference.
• We will not talk about issues such as instrumental balance
– or the differences between microphones or sample rates.
– We will talk about basic sound properties:
• The clarity and localization of the direct sound
• The perceived distance between the sound source and the listener (depth)
• The recording and reproduction of the sound of the hall.
Major Goals
• To review the physical and psychoacoustic properties that make a great
recording (or a great performance space).
– The clarity of the direct sound (the absence of muddiness)
– The creation of a large listening area and a stable front image – using three front
speakers in a 5.1 recording.
– The blending together of the different instruments into a whole acoustic scene
through early reflections.
– The re-creation of the acoustic space of the performance, through late reflections
and envelopment.
• To show how muddiness occurs when there are too many early
reflections
• To show how we perceive muddiness through our perception of pitch.
• To show how the loudspeaker positions in the playback room
influences the envelopment at low frequencies.
• To play as many musical examples as possible!
Localization – a stable front image
over a large listening area
• In a high-quality recording the front image does not
greatly change when a listener moves away from the
sweet spot.
– Image stability requires using the center channel speaker in
a 5.1 recording.
– Even without the center speaker some two channel
recordings are more stable than others.
– Popular music recordings are often better than classical
recordings in image stablilty.
• The secret is Amplitude Panning
– Which is almost universally used in popular music
recording.
Time delay panning
• Many engineers attempt to record a broad sound
source with closely spaced microphones
– Omni microphones are often used in a so-called “Decca
Tree”.
– Cardioid microphones are often used in the “ORTF”
configuration
• Both these techniques rely on time delay differences
to spread the front image
– Time delay spreading only works when the listener is in the
sweet spot.
– The front image is not stable over a large area.
Training to hear localization
• The importance of ignoring the sweet spot
– Most research tests of localization use a single listener, who is strictly
restricted to the sweet spot.
– Your customers will not listen this way!
• How do you know if the recording has a stable front image?
– Move laterally in front of the loudspeakers. Does the sound image stay
wide and fixed to the loudspeakers, or does it follow you?
– Do the soloists in the center follow you left or right? If they do they are
recorded with too much phantom center.
• Since most 5 channel recording methods are derived from stereo
techniques almost all have too much phantom center.
• A center image that follows a listener who moves laterally out of
the sweet spot is the most common failing of even the best five
channel recordings.
»
Play examples
Example: Time delay panning outside
the sweet spot.
Record the orchestra with a
“Decca Tree” - three omni
microphones separated by one
meter. A source on the left will
be picked up with equal level in
all three microphones. The time
delays will be different by +-3ms.
On playback, a listener on the far right
will hear this instrument coming from the
right loudspeaker. This listener will hear
every instrument coming from the right.
Amplitude panning outside the sweet
spot.
If you record with three widely spaced
microphones, an instrument on the left
will have high amplitude in the left
microphone. The time delay will also
be much shorter.
A listener on the far right will hear the
instrument on the left. Now the
orchestra spreads out across the entire
loudspeaker basis, even when the
listener is not in the sweet spot.
WARNING!!!
• In the author’s experience a front image that
is not stable when you walk in front of the
speakers will never make a great recording.
– regardless of how beautiful it is in the sweet
spot.
• This is my FIRST test of a recording, either
two channel or surround.
Summary of acoustic perceptions in a
recording
• 1. Clarity – the lack of muddiness
– Clarity is perceived through the direct sound – sound that travels
directly from the instrument to the microphone.
• A clear direct sound requires that the microphone be relatively close to the
instrument!
• 2. Blend and depth
– Blend and depth are perceived through early reflections that arrive from
all around the listener.
• The total energy in these early reflections must be less than the energy in
the direct sound!
• In a surround recording these reflections should come equally from all the
loudspeakers (except the center,) and they must be decorrelated. (different)
• 3. Envelopment (reverberation)
– Envelopment is perceived through late reflected energy that arrives
from all around the listener. (Not just from the rear!)
• The energy must be decorrelated in each loudspeaker
Clarity
• Clarity to an acoustician is determined through
intelligibility – the ability to understand speech
or a musical line.
• For this talk I will use a different meaning:
– For me clarity is the perception that the sound
source is acoustically close to the listener.
– While this definition may seem vague, almost
everyone agrees on the optimal acoustic distance
for a recorded sound source.
– We can demonstrate this perception:
Muddiness: Dry Speech + 40ms
reflections
Mono speech:
The sound is clear,
but much too close
to the loudspeaker.
Speech with ~40ms
allpass reflections
and no direct sound.
Mono:
Stereo:
Note both the mono
and the stereo
version sound
muddy and distant.
There is no phantom
image in the stereo
version.
Reflections used in these experiments
The reflections used in these experiments form a decaying burst which peaks about
25ms after the direct sound, and has largely decayed away by 50ms.
The reflections are different in the two channels, and have a flat frequency response.
Optimum level for Early Reflections
• Recorded sound consists of a mix of direct sound and
reflections
– Too many reflections and muddiness results.
– But reflections add a sense of blend and depth.
– An optimum mix must be found.
• The optimum level for early reflections is -4 to -6dB relative to
the direct sound.
– This level is preferred by almost every listener.
• In a surround recording the reflections should come equally
from all directions (except the center), and be decorrelated.
• The perceived result is independent of the precise delay time
and the pattern of the reflections.
– It is the total energy which determines the perception.
Depth without Muddiness
•
Dry speech
– Note the sound is uncomfortably close
•
Mix of dry with early reflections at -5dB.
– The mix has distance (depth), and is not muddy!
– Note there is no apparent reverberation, just depth.
•
Same but with the reflections delayed 20ms at -5dB.
– Note also that with the additional delay the reflections begin to be heard as discrete
echos.
• But the apparent distance remains the same.
•
Same but with the reflections delayed 50ms at -3dB
– Now the sound is becoming garbled. These reflections are undesirable!
– If the speech were faster it would be difficult to understand.
•
Same but with reflections delayed 150ms at -12dB
– I also added a few reflections between 20 and 80ms at a level of -8dB to
smooth the decay.
– Note the strong hall sense, and the lack of muddiness.
The ideal mix
• We see from the previous slide that the ideal acoustic mix has
three independent perceptual requirements:
– 1. The direct sound dominates the total energy by at least 4dB.
– 2. There are early reflections that add blend, distance, and depth to the
sound.
• These should come equally from all directions in a surround recording
• And they should avoid adding energy in the 50ms to 100ms time region.
– 3. There should be reflections (reverberation) with time delays greater
than 150ms to provide the impression of the hall.
• To make a great recording we must separately capture all
three!
Direction of early reflections
• It is not possible to detect whether the reflections come from the front
or the rear when they arrive between 20ms and 50ms after the end of a
sound.
• But it is more natural if they come from both front and rear.
• Using all four speakers also results in the largest sweet spot - demo
Muddiness is hard to avoid in small
spaces!
• We are attempting to show that the optimum total energy for
all reflections is at least 4dB less than the direct sound.
• The total reflected energy sum does not include the floor reflection.
– I will explain why later if there is time.
• The direct sound must dominate the total sound picture
– The reverberation radius of a small hall or church is usually below 2m,
and may be as low as 1m.
– Every microphone used in the recording picks up both direct sound and
reverberation.
• But only the microphone closest to the sound source picks up true direct
sound.
• Direct sound into all the other microphones is perceived as a reflection,
and adds to the potential distance and muddiness.
Muddiness also comes from the
playback room!
In this room there is no absorption in the front,
and thus the reverberation radius is small,
perhaps as low as 2.5m.
The distance from the front loudspeakers to
the listeners is greater than the reverberation
radius.
So the reverberation will be stronger than the
direct sound.
We are trying to keep the direct sound stronger
than the reflections by 4dB.
This goal is probably not possible to achieve
in this room! (Except at frequencies above
1000Hz, where the side curtains begin to be
absorptive.)
Always mix your recordings in an
absorbent space!
Boston Cantata Singers Cantata #76
Die Himmel erzahlen die Ehre Gottes
Performance in
Jordan Hall, January
23, 2004.
Reverberation time
in Jordan ~1.4
seconds at 1000Hz.
This is similar to the
Semperoper
Dresden.
The typical audience
member is ~ 3
reverb radii from this
singer.
The dramatic
consequences are
highly audible.
Although Jordan is beloved as a chamber music hall, the stage house is deep and reverberant.
When the hall is full, the sound in the audience can be dry and muddy.
The recording engineer must overcome these obstacles.
Cantata Singers Bach BWV 76
Multimiked recording. Note the clarity of
vocal timbre (low sonic distance).
Recording simulating the sound in the
hall. Note the timbre coloration and the
sense of distance to the performers.
With the picture and after adaptation the
performance is quite enjoyable.
The Ideal Reverberation
– has 20ms to 50ms reflections with a total energy -4dB
to -6dB
– has relatively little energy from 50 to 150ms.
Most small rooms – (including playback rooms)
– Have exponential decay
– If we pick up enough late reflections to hear the hall, we
will get too many early reflections.
• We will get coloration and poor intelligibility.
Example of as small recording space:
Swedenborg Chapel, Cambridge
Oriana Consort in Swedenborg Chapel
Oriana Setup
Recording in Sweedenborg Chapel,
Cambridge
• The chapel holds perhaps 200 people, but when it is empty the
RT is ~ 1.8 seconds.
– And the reverberation radius is ~ 1.5m
• The picture shows four supercardioid microphones about 1m
from the chorus. These provide the direct sound.
– With the supercardioid pattern we have a 6dB direct/reverberant ratio,
so the reverberation is less than the direct sound by about 6dB.
– Note that in this space we must add hall sound and early reflections
very carefully, or the sound will become muddy!
– In addition the early reflections and reverberation arrive soon after the
direct sound. The sound seems small and cramped. There is no sense
of space around the direct sound.
• The chorus microphones are as close as they can be to the
chorus without creating balance problems.
– We cannot exclude the early reverberation by moving the mikes closer.
Main microphones in Sweedenborg Chapel
• The picture also shows two variable pattern
microphones about 2m from the chorus.
– I put these there for an experiment. The sound is not very
good…
• The problem with a “main microphone” pair in this
space is that it must be placed too far from the
singers!
• A main pair must be at least 2.5m away or there will
be balance problems.
– This distance is beyond the reverberation radius, and the
sound will be muddy.
Hall Sound in Sweedenborg
• The chapel is reverberant – with a high
reverberation level
• But the reverberation is too strong in the 10150ms time range.
• Using cardioid microphones pointing away
from the sound source reduces the early
reverberation energy and maximizes the late
energy.
• The hall sounds larger and better.
Distance Perception and MUD
• Reflections during the sound event and up to 150ms after
it ends create the perception of distance
• But there is a price to pay:
– Reflections from 10-50ms do not impair intelligibility.
• The fluctuations they produce are perceived as an acoustic “halo” or
“air”around the original sound stream. (ESI)
– Reflections from 50-150ms contribute to the perception of distance
– but they degrade both timbre and intelligibility, producing the
perception of sonic MUD.
• We will have many examples of mud in this talk!
Training to hear MUD
• Mud occurs when the reverberant decay of the recording venue has too
much reflected energy in the 10-150ms region of the decay curve.
– This is true of nearly all sound stages, small auditoria, and churches.
• If you are recording in such a space with a relatively large ensemble,
you are in trouble.
• The perception of mud can be tricky, because our hearing mechanism
adapts to a muddy environment, and the sonic degradation becomes
inaudible after about 10 minutes.
– It is easy to convince yourself the recording is excellent when you have
been listening to it all day.
• This is why we can enjoy a concert even when we are sitting far from the
instruments.
– You MUST compare your recording to a reference recording in a short
time A/B test.
Example: John Eargle at Skywalker ranch
• John Eargle has made wonderful recordings, particularly those
with the Dallas Symphony on Delos Records
– But even he can be fooled by a small space
– As I said, you adapt quickly to such a space, and no longer hear the
mud that it produces.
• John Eargle recently made a 5.1 channel DVD audio recording
at the Skywalker ranch in Los Angeles.
– He was very excited by it – but listen and compare to Dallas.
• Skywalker is a large sound stage with controllable acoustics.
It is not a concert hall.
• As a consequence the reverberation radius is relatively short.
By my estimate (without having seen it) the radius is less than
3.5 meters.
• It is very easy to record mud in such a space.
– Many instruments are beyond the reverb radius.
– Adding more microphones only increases the reverberant pickup.
Recording in a large space is much easier!
Covenant church is a very large space, holding more than 1000 people. It is damped by
pew cushions and acoustic treatment on the walls, yielding a RT of 2.5 seconds and a large
reverberation radius – probably above 3m.
The microphones can be quite distant without picking up early reflections or reverberation.
It is a very good place to record! (And it is exceptionally beautiful visually…)
Example – depth perspective through
mike technique:
• When the reverberation radius is large enough we can use an
extra pair of microphones to create a single early reflection.
– This can provide the needed perspective and depth
Direct sound:
Mike
480L
Early reflection:
Late reverberation:
Mike
Direct + Early -5dB:
Direct + Early + Late -8dB:
480L
The depth impression is greatly improved
in surround
• I will run the same experiment, but use all five speakers.
• The early reflections will come from both the front and rear
equally, but different delay patterns will be used for each
speaker.
– This means the reflections are decorrelated.
• The late (hall) reflections will also come equally but
decorrelated in the front and rear speakers.
– This will create a large and uniform sweet spot for the acoustics.
The Polyhymnia Pentangle
• The Polyhymnia engineers employ a surround array of spaced omni
microphones, at a spacing similar to the ITU playback array.
In practice the Polyhemnia engineers
often pick up the direct sound with
accent microphones.
In this case the front microphones
provide a first reflection to the front
speakers.
The center microphone is also often
moved closer to the sound sources, so it
picks up mostly direct sound.
• The technique works well in spaces where the reverberation radius is
equal to or greater than the microphone spacing!
• In this case the direct sound picked up by the rear microphones is
perceived as an early lateral reflection and the adds distance to the
front image.
• Caution!! In a small hall this array will be TOO MUDDY!!!
Boston Symphony Hall
Boston Symphony Hall
• 2631 seats, 662,000ft^3, 18700m^3, RT 1.9s
– It’s enormous!
– One of the greatest concert halls in the world – maybe the
best.
– Recording here is almost too easy!
– Working here is a rare privilege
• Sufficiently rare I do not do it. (It’s a union shop.)
– The recording in this talk is courtesy of Alan McClellan of
WGBH Boston. (Mixed from 16 tracks by the presenter)
– Reverb Radius is >20’ (>6.6m) even on stage.
– The stage house is enormous and NOT reverberant. With
the orchestra in place, stage house RT = ~1 sec
Boston Symphony Hall, occupied, stage
to front of balcony, 1000Hz
This picture compares favorably to our picture of the ideal reverberation on a recording.
But this is what an audience member hears 100 feet from the stage!
Boston Symphony Orchestra in Symphony Hall
Boston Cantata Singers in Symphony Hall. March 17, 2002
Microphone Array (WGBH)
Beware the “main microphone” array
• Nearly all engineers will provide a “main microphone” usually a
“Decca Tree”, or a pair of omni or cardioid microphones.
• Almost always the sound from this array is only acceptable for
instruments close to the microphones.
– Most of the instruments are far beyond the reverberation radius.
– The more distant instruments must be spot-miked.
• A cardioid pair (ORTF) has too much phantom center for an acceptable
surround recording. (this is a two-channel technique only.)
• Very frequently time delay panning (for a Decca Tree or spaced omnis)
makes the sound unusable in a high-quality mix.
– Time delay panning makes the front image unstable
– Closely spaced microphones yield high correlation at low frequencies,
which degrades the sense of space.
• It is better to simply turn off the main microphone (even if your
instructor insists you install one.)
Front pair
Front pair LF
• In our Boston Symphony Hall recording a pair of B&K omnis spaced
~25cm was hung behind the conductor by the WGBH engineer.
Correlation in the “main microphone:” two omnis
spaced by ~25cm, just behind the conductor.
___ = measured correlation; - - - = calculated, assuming d=25cm
The high correlation in this pair makes the sound unusable in a stereo or surround
mix. It sounds unpleasant even in this lecture room, as the audio demo makes clear.
Beware the exclusive use of spaced front
microphones
• In our recording the wide front orchestra pick-up is fine for the first
row of the strings.
• But nearly all the orchestra is beyond the reverberation radius for these
microphones.
• If we want good balance and clarity, we must use additional
microphones over the orchestra
– And treat these microphones as part of our “main” array.
• Using cardioid microphones in front will help a lot.
– The cardioid is 4.7 dB less sensitive to reverberation, which will pick out
more distant instruments with clarity.
• Using super cardioid microphones will help a little bit more.
– But if the stage house is reverberant the improvement is minimal.
• The author greatly prefers to use (equalized) directional microphones
for orchestra and chorus pick-up.
– After equalization the bass performance is adequate.
– There is better control of leakage, and less MUD.
Balance and distance come first
• In any recording the balance between the musical forces
should reflect the needs of the music.
• In this recording, even with 120 singers the chorus is
nearly inaudible in the hall.
– So we must heavily use the chorus accent microphones.
– In the final mix MOST of the energy in the recording will come
from these. In practice, these are our MAIN microphones!
• However, if we heavily use the chorus microphones, the
chorus will sound too close to the loudspeakers
– And in front of the orchestra.
• To correct this distance problem we MUST use electronic
early reflections.
– There is no other possible solution.
» Play example
Let’s build the hall sound
• We need decorrelated reverberation in both the front and
the rear with equal level
• Test just the hall microphones to see if the reverberation is
enveloping and uniform.
• Then add the front microphones for the direct sound.
– Where the hall balance is not correct you MUST augment the
natural reverberation with electronics.
• In this recording the orchestra is much stronger than the
chorus – even with 120 singers – and there is too little
chorus in the natural reverberation!!
– When we add the accent microphones the chorus will sound as if
they are in a smaller space.
• So we add electronic reverberation from the chorus
(equally in all four outer speakers) from the surround
reverberator.
Final Mix
• The final mix uses the three omni microphones over the
chorus as the main microphones. They are simply patched
to left, center, and right.
• The spot microphones for the soloists are mostly mixed to
the center, with some panning to the left or right. (No
divergence was used.)
• The orchestra is a combination of two wide spaced omnis
patched to left front and right front.
– Augmented by spot microphones over the woodwinds and the
more distant strings.
– the center channel was provided automatically through leakage
from the soloists’s microphones.
• The rear channels come from a widely spaced pair of
omnis about 20 feet behind the conductor,
– Extensively augmented by electronic early reflections and late
reverberation.
Hall sound: decorrelation at low
frequencies.
• It is widely believed that localization is impossible below
100Hz.
• So a single subwoofer has become the standard for
reproducing low frequencies.
• Although localization below 100Hz is difficult in a small
room, there is a large difference between a single subwoofer
and an independently driven pair.
– We have turned off the subwoofer in this room and we are running the
other speakers full-range.
• A great recording will easily demonstrate the difference
between a single subwoofer and full-range discrete speakers.
• As a consequence you must be sure the hall sound in your
recordings is decorrelated at low frequencies!
– Both in the front and in the rear of a surround recording.
– Most single microphone array surround techniques fail for this reason.
Conclusions
• A great recording:
– Has a stable front image over a large listening area.
– Has direct sound stronger than early reflections,
microphone leakage, and reverberation.
• So it is not MUDDY!
– Has decorrelated early reflections both in the front speakers
and in the rear speakers.
• These provide a sense of blend and depth to the recording. But be
sure to mix in an absorbent space!
– Has decorrelated late reverberation in both the front and the
back speakers.
• The decorrelation must be active for low frequencies
– It is possible to make a great recording in a small space
• But if the group is physically larger than the reverberation radius,
electronic early reflections and reverberation will probably be
necessary.
Medial Reflections – the detection of
muddiness.
• Medial reflections can cause clear differences in quality.
• We can measure medial energy through an analysis of
pitch.
• Pitch information is available in each critical band, even
those above the frequency of auditory phase-locking.
• Here is an example of speech filtered into a 1000Hz 1/3
octave band.
The waveform appears to
be a series of decaying tone
bursts, repeating at the
fundamental frequency.
When this signal is
rectified, there is
substantial energy at the
fundamental frequency.
Waveform of speech formants
The waveform of the word “five” in the
2kHz 1/3 octave band.
The same, but convolved with a 20ms
windowed burst of white noise, simulating
a diffuse reflection, or the sound of a small
reverberant room.
Non-reverberant speech has a clear repeating pattern in the waveform. Reverberant speech
does not. We can devise a measurement system around this difference.
The plus/minus pitch detector
The pitch detector operates separately on each third octave band. Each band is rectified and
low-pass filtered. The output is delayed, and then added and subtracted from the undelayed
signal. The logs of the “plus” signal and the “minus” signal are then subtracted from each
other. The result has a high sensitivity to fundamental pitch.
Example – “one, two” 2500Hz 1/3 octave
band.
Pitch detector output with dry speech – the syllables “one, two” with no added
reverberation. Note the high accuracy of the fundamental extraction and the >15dB S/N
Same – but convolved with 20ms of white
noise
Convolving with white noise does not change the intelligibility, nor the C80, but
dramatically changes the sound – and the pitch coherence. By chance the second syllable
is not seriously degraded, but the first one is – at least in this 1/3 octave band
The sound quality is markedly degraded. We need a measure for this perception.
“one,two” 2500Hz band – equal mix of
direct and one diffuse reflection at 30ms.
The high pitch coherence and high direct/reverberant ratio in the first 30ms is easily seen at
the start of each syllable.
Segment of opera – old Bolshoi
Segment from
the old
Bolshoi
Segment from
the new
Bolshoi. (I
was unable to
produce a
similar plot.)
Segment of Verdi – pitch coherence of the 2500Hz 1/3 octave band. F, F, glide to A.
Recording from the back of the first balcony. There is no obvious gap before reflections
arrive, and the pitch coherence appears relatively high.
Sound examples – syllables “one,two,three” with no
reverberation
1kHz 1/3 octave band
2kHz
1.25kHz
1.6kHz
2.5kHz
3.2kHz
Note the height and frequency of the pitch coherence peaks are (almost) uniform through all
bands.
Maximum pitch coherence vs 1/3 octave band
for non-reverberant speech
The syllables “one
two three four five
six seven” are
analyzed.
Note that the
maximum pitch
coherence is
relatively constant
across all 1/3 octave
bands, although the
value depends on
the particular vowel
“one,two,three” convolved with 20ms noise
1kHz
1.25kHz
2kHz
2.5kHz
1.6kHz
3.2kHz
Note that most of the pitch coherence has been eliminated
Maximum pitch coherence vs /3 octave bands
for speech convolved with 20ms noise.
The syllables “one
two three four five
six seven” are
analyzed.
Note the pitch
coherence is low and
not constant across
third octave bands.
Pitch coherence of speech with a
diffuse reflection at a level of 0dB
1kHz
1.25kHz
2kHz
1.6kHz
2.5kHz
Note the low pitch coherence for some of the syllables in several bands
Maximum pitch coherence vs 1/3
octave bands for direct + reverb at 0dB
Analysis of the
syllables “one two
three four five six
seven.”
Note the low and
noise-like coherence
for most of the
syllables.
Pitch coherence of speech with a diffuse
reflection at a level of -4dB (optimum)
1kHz
1.25kHz
1.6kHz
2kHz
2.5kHz
3.2kHz
Note the high pitch coherence on most syllables in most bands. This reflection level is usually
chosen as optimum.
Max pitch coherence vs 1/3 octave
band for direct and reflected at -4dB
Analysis of the syllables
“one two three four five
six seven.”
Note the pitch coherence
is both high and uniform
across 1/3 octave bands
Teatro Alla Scala, Milan
Echograms from LaScala.
(From Hidaka and Beranek)
illustrate these profiles:
Top curve - 2kHz octave band,
0-200ms
At 2kHz note the high direct
sound and low level of
reflections in the 50-150ms
time range.
Bottom curve - 500Hz octave
band 0-200ms
Note the high reverberation
level – and short critical
distance.
Let’s listen to Alla Scala!
• Matlab can be used to read these printed impulse responses and
convert them into real impulse responses.
– 1. First we read the .bmp file from a scan, and convert the peaks in the file
to delta functions with identical time delay, and an amplitude equivalent to
the peak height.
• All the direct sound energy is combined into a single delta function, and the
level of the direct sound is normalized (relative to the rest of the decay), so the
2kHz and 500kHz impulses can be accurately combined.
– 2. We then apply a random variable ~+- 5ms to the delay time to correct
for the quantization in the scan.
– 3. We then extend the echogram to higher times by tacking on an
exponentially decaying segment of white noise, with a decay rate equal to
the published data for the hall.
– 4. We then filter the result for the 2kHz echogram with a 1k high-pass
filter, and combine it with the 500Hz echogram low-pass filtered at 1kHz.
– 5. If desired we can create a “right channel” and a “left channel”
reverberation by using a different set of random variables in steps 2 and 3.
– 6. We convolve a segment of dry sound with the new impulse response.
– The result is sonically quite convincing!
Alla Scala at 500Hz – reading the plot
Top curve – 500Hz measured
impulse response as given by
Beranek. JASA Vol. 107 #1,
Jan 2000, pp 356-367
Bottom curve – impulse
response as regenerated from
delta functions, passed through
a 500Hz 6th order 1 octave
filter.
Note the correspondence is
more than plausible.
Alla Scala 500Hz – randomizing and extending
Top graph: Alla Scala published data
Bottom graph: regenerated impulse
response after randomization and
extension.
Pitch coherence of speech in La Scalla
1kHz
2kHz
1.25Hz
2.5kHz
1.6kHz
3.2kHz
Note the excellent sharpness of the pitch peaks, and good consistency across bands.
Maximum coherence vs 1/3 octave
bands La Scala, Milan
Pitch coherence is
similar to our
example where the
direct/reverberant
ratio ~=4dB
While not as clear as
in some examples,
fundamental pitch is
easily extracted
using this simple
detector.
Listen to Alla Scala, NNT Tokyo, Semperoper
2kHz
500Hz
2kHz and 500Hz
Impulse
responses from
Scala Milan
NNT Theater
Tokyo
Semper Oper
Dresden
Original Sound
(All data from
Hidaka and
Beranek)
Pitch Coherence – NNT opera house, Tokyo
1kHz
1.25kHz
1.6kHz
2kHz
2.5kHz
3.2kHz
Note the peaks – where they exist – are very broad, indicating inexact pitch extraction. For most
bands, there is no extracted pitch for all syllables.
Maximum coherence vs 1/3 octave
band NNT Opera Theater, Tokyo
Fundamental pitch is
not extractable using
this simple detector.
Binaural Examples in Opera Houses
It is very difficult to study opera acoustics, as the
sound changes drastically depending on:
1.
the set design,
2.
the position of the singers (actors),
3.
the presence of the audience, and
4.
the presence of the orchestra.
Binaural recordings made during performances
give us the only clues.
Here is a sound bite from a famous German opera
house:
Note the excessive distance of
the singers, and the low intelligibility. This
is MUD in action!
And here is an example from another famous
German opera house:
Note the increase
in intelligibility, reduced distance, and the
improvement in dramatic connection
between the singer and the audience.
Synthetic Opera House Study
•
•
We can use MC12 Logic 7 to separate the orchestra from the singers on
commercial recordings, and test different theories of balance and reverberation.
From Elektra – Barenboim. Balance in original is OK by Barenboim.
Original
Orchestra Left&Right
Vocals
Downmix - No reverb
on the singers
Reverb from orchestra
Reveb from singers
Downmix with reverb
on the singers.
Muddiness: Dry Speech + 20ms
noise
Mono speech signal
Convolved with noise
– (diffuse reflections)
Mono:
Stereo:
Note the reflections
increase muddiness
and distance.
The stereo version is
more natural than the
mono, but equally
distant.
Recorded speech in Covenant
Voice segment
recorded at 1.5m
with a supercardioid
mike
The same segment
with the reflections
below.
Note the muddiness
increases
dramatically
A frequency-flat
reflection pattern
with peak energy
about 30ms after the
direct sound
Demo 1: Clarity
• Demonstrate dry sound
• Demonstrate muddy sound by adding reflections in
monaural.
– Note that adding only very early reflections does not
decrease the intelligibility.
– But it increases the perceived distance of the source.
• Demonstrate adding reflections in surround
– Note that adding the reflections in surround increases the
perceived distance more effectively.
– Less reflected energy is needed, and the direct sound
remains clear.
• The optimum early energy is between -4dB and -6dB