The Physics and Psychophysics of Surround Recording

Download Report

Transcript The Physics and Psychophysics of Surround Recording

Multichannel solutions for sound
enhancement and acoustic
conditions in concert halls and operas
David Griesinger
Lexicon
[email protected]
www.world.std.com/~griesngr
Major Goals
• To explain and demonstrate the degree to which the “acoustics” of
halls and operas may not be the same as the “sound” in these spaces.
– The dependence of acoustics on visual aspects of architecture and on the
expectations of the listeners may be underappreciated.
• To show how physics and psycho acoustics combine to produce
absolute standards of acoustic quality for sound in opera houses and
concert halls.
– To suggest that “sonic distance” – the perceived audible distance between
a performer and a listener – is the major descriptor of this acoustic quality
in an opera house.
• To explain and demonstrate how electronic acoustic enhancement can
be used to achieve higher sonic quality in some halls.
• To play as many musical examples as possible – using multichannel
discrete surround and two channel to five channel conversion.
What constitutes good sound?
Leo Beranek [JASA 107
pp368-383 Jan. 2000] –
rank ordered houses by
asking conductors to fill
out a questionnaire.
Semperoper Dresden is
ranked nearly at the top, as
is the Teatro alla Scalla.
But the SOUND of these
two theaters is extremely
different. Semperoper is
highly reverberant, and La
Scalla is highly damped.
In practice the remembered
“sound” of an opera house
can depend strongly on
non-sonic factors.
High-Definition Demo
• Brahms F minor Piano
Quintet
– Performed by the faculty of
the Point-Counter-Point
Summer camp.
– Video is high-definition
(with some artifacts.)
– Audio is two channel,
single microphone pick-up.
– Played here (after post
production) with twochannel to five-channel
processing.
Why is there so much confusion?
• 1. Research methods based on questionnaires suffer from
a fundamental properties of acoustic perception:
– The supression of acoustic perception after a short time period.
– The inability to accurately remember the sound quality.
• 2. We might be asking the wrong questions to the wrong
people
– The conductor is only one of the many people who work to present
opera to the public
– For most of these people the music is secondary to the drama.
Their job is to get the story and the emotion to the audience.
– To most people involved in opera production the Clarity of the
singers and the balance between them and the orchestra is of the
utmost importance.
Measurement methods for halls and operas
are inadequate and often misleading
• Sabine’s reverberation time is useful, but it is the combination of
reverberation time and reverberation level that we perceive.
• Jordan’s EDT measure was intended to measure the direct/reverberant
ratio.
– But EDT is based on the decay of very long sounds, and does not measure
the hall response to short sounds.
• Schroeder’s method of measuring EDT (which is now an international
standard) gives results that are independent of the direct/reverberant
ratio.
– Schroeder misunderstood the purpose of the measure.
– His method yields results essentially identical to the reverberation time.
• C80 and related measures use 80ms as a division point between early
and late.
– But in fact human perception utilizes THREE time regions: 0-50ms, 50-150ms,
and 150ms+. Intelligibility correlates best with C50, not C80, and reverberance
correlates best with the ratio between the energy from 0-50ms to the energy 150ms
and greater.
It is difficult to remember the sound of
acoustics
• Human physiology suppresses acoustic perception.
– After 5 to 10 minutes in a particular space we lose the ability to
perceive its acoustic properties.
– Work by Shin-Cunningham suggests that the process of extracting
speech information from acoustic interference is adaptive.
• We adapt to a particular situation in 5 to 10 minutes, and the
adaptation is unconscious.
• After the adaptation period the perception of muddiness (mulmig or
glauque) becomes difficult to perceive and to remember.
• As a consequence, it is difficult to remember the properties
of an acoustic space, particularly for speech.
– Unless intelligibility is seriously compromised.
• We need to compare acoustic sounds BEFORE our
physiology adapts to them.
– We need relatively rapid A/B comparisons to accurately rank
acoustic quality.
Boston Cantata Singers in Jordan Hall
Cantata Singers Rake’s Progress
Performance in
Jordan Hall, January
26, 2003.
Reverberation time
in Jordan ~1.4
seconds at 1000Hz.
This is similar to the
Semperoper
Dresden.
The typical audience
member is ~ 3
reverb radii from this
singer. (reverb 10dB
stronger than direct)
The dramatic
consequences are
highly audible.
It is amazing that in spite of the enormous acoustic distance, the performers still manage to
project emotion to the listener. The performance received fabulous reviews. But the situation is
not ideal. One reviewer commented on the regrettable lack of surtitles. The opera is in English.
Distance in Jordan Hall
• Reverberation time (full) measured as ~1.4 seconds at
1000Hz.
• Reverberation radius ~ 10 feet inside the stage house, ~14
feet in the hall.
• Thus a typical listener will be ~ 3 reverberation radii away
from a singer who is fully upstage. This implies a
direct/reflected ratio of –10dB.
• Jordan Hall is not renowned as an opera venue – perhaps
we are hearing why.
• But the size and reverberation time are almost identical to
the Semperoper Dresden, which is currently regarded as
one of the best!
Binaural Recordings
• Manfred Schröder suggested that Binaural recordings
could be used to compare different concert halls in the
laboratory.
• The method has many difficulties
– Matching of pinnae shape of the microphone to the listener.
– Matching of the playback equipment to the listener.
– These difficulties are particularly acute in studying concert hall
acoustics.
– Schröder suggested the use of a “cross-talk canceller” to solve
some of these problems.
• However, in our experience the differences between opera
houses are so large that relatively simple recording and
playback equipment can capture the essential aspects of the
sound.
– And that these differences can easily be heard even with
loudspeaker playback.
Glasses microphones
“dual” lavaliere
microphones from
Radio Shack plug
directly into a minidisk recorder.
The result is free of
diffraction from the
pinnae of the person
making the recording,
which is an
advantage.
When combined with a calibrated pair of headphones, this system reproduces sonic
distance, intelligibility, and envelopment quite well.
Binaural Examples in Opera Houses
It is very difficult to study opera acoustics, as the
sound changes drastically depending on:
1.
the set design,
2.
the position of the singers (actors),
3.
the presence of the audience, and
4.
the presence of the orchestra.
Binaural recordings made during performances
can give us important clues.
Here is a short example from the Semper Oper
Dresden. This hall was rebuilt in 1983, and
considerable effort was expended to increase
the reverberation time. The RT is over 1.5
seconds at 1000Hz, which implies a
reverberation radius of under 14’.
This hall is ranked nearly the best in Leo’s
survey. Note the excessive distance of the
singers, and the low intelligibility
Staatsoper “unter den Linden” Berlin
The Staatsoper Berlin
is similar in size to the
Semperoper, and the
acoustics in Berlin are
probably much closer
to the original acoustics
in Dresden
RT at 1000Hz ~0.9s
(without LARES).
With LARES the RT at
1000Hz is ~1.1s, but
the RT is ~1.7s at
200Hz.
Here is a recording
made from the parquet,
about 2/3’s of the way
to the back wall.
Although this hall does not even appear in Leo’s survey, it is
currently by far the most vital of the Berlin Opera houses.
Deutsche Oper, Berlin
In spite of the
impressive wood
paneling, the sound
in this hall is rated
between “pretty
poor” and “gastly”
by the people I
interviewed during a
site visit.
It is perhaps significant that this hall is moribund. They are searching for both a new music
director and a new general manager. Concerning the acoustics, I was told that they are just
waiting for the architect to die, so they can re-design it.
But how should it be redesigned? Just what is wrong with it as it is?
Bolshoi
The old Bolshoi
in Moscow is
similar in
design to the
Staatsoper but
larger. The
recording was
made from the
back of the
second ring,
and is
monaural.
RT ~ 1.1
seconds at
1000Hz, rising
at low
frequencies.
In my opinion the sound in this hall is extremely good. The dramatic impact of the
singers is phenomenal for such a large hall, and envelopment in the parquet is high.
This theater is extremely popular – nearly impossible to get into without paying a
scalper ~$100.
New Bolshoi
The New Bolshoi
is very similar to
the Semperoper
Dresden. The
Semperoper was
the primary
model for the
design.
RT ~1.3 seconds
at 1000Hz.
The general manager views this theater as unsuccessful acoustically. There
have been many complaints – the singers are both too loud and too hard to
hear. This theater suffers greatly from having the old Bolshoi next door!
What is it about
the SOUND of
this theater that
makes
communication
with the singers
so difficult?
The Sound of Opera – the blind opera fan.
• What distinguishes the SOUND of the New Bolshoi from the
Staatsoper Berlin, or the Royal Theater, Copenhagen?
–
–
–
–
–
Reverberation time?
Intelligibility?
Envelopment?
Balance?
All might be involved
• An informal poll of acousticians gave the result that EVERY ONE
thought 1.5 seconds was the ideal reverberation time.
– And yet the two Bolshoi theaters dramatically contradict this idea.
• Intelligibility in ALL the theaters I have visited is satisfactory. Here is
dialog from the Semperoper:
• Envelopment in the parquet of the old Bolshoi is high, even with a low
reverberation time. Here is a segment from “Gisielle”
• Balance IS important – but it is not sufficient to explain the differences
we hear.
Balance between the orchestra and the soloists
Reverberation time affects
balance, due to the
directional properties of the
human voice. Note that the
loudness of the orchestra
increases about 1.5dB as
RT rises from 1s to 1.5s.
This rise is not sufficient to
explain the large dramatic
differences between
Semperoper Dresden and
Staatsoper Berlin.
Sonic Distance
• Even casual listening to the examples in this paper reveals
that the most obvious difference is how far away the voices
seem.
• Loudness is a primary distance cue.
– This distance cue can be overcome by trained actors and singers,
who know how to project their voices with sufficient energy.
– If you have the money you can hire singers with more vocal power.
• The main secondary cue for distance is the ratio between
the loudness of the direct sound and reflected energy that
arrives between 50 and 150ms after the direct sound.
• When this energy is excessive the singers can sound loud,
but muddled and far away.
– Dramatic connection between the actors and the audience suffers.
Human sound perception – Separation of the
sound field into foreground streams.
• Acousticians are entranced with reflections – rather arbitrarily divided
into “early” and “late”.
– But human perception works differently.
– Human brains evolved to understand speech, and to ignore reflections.
Third-octave filtered speech.
Blue 500Hz.
Red 800Hz
Speech consists of a series of
foreground sound events
separated by periods of
relative silence, in which the
background sound can be
heard.
One of the most important preliminary functions of
human hearing is stream formation
• Foreground sound events (phones or notes) must be
separated from a total sound field containing both
foreground and background sounds (reverberation, noise).
• Foreground events are then assembled into streams of common
direction and/or timbre.
• A set of events from a single source becomes a sound stream, or a
sound object. A stream consists of many sound events.
– Meaning is assigned to the stream through higher level neural functions,
including phoneme recognition and the combination of phonemes into
words.
• Stream separation is essential for understanding speech
– When the separation of sound streams from noise is easy,
intelligibility is high.
• Separation is degraded by noise and reverberation.
• This degradation can be measured by computer analysis of binaural
speech recordings.
• Stream formation is entirely sub-conscious.
– We can consciously choose which stream listen to, but we can not
influence the separation process.
Separation of binaural speech through analysis of
amplitude modulations
Reverb
forward
Reverb
backward
Analysis into 1/3 octave bands,
followed by envelope
detection.
Green = envelope
Yellow = edge detection
By counting edges above a
certain threshold we can
reliably count syllables in
reverberant speech.
This process yields a measure
of intelligibility – not distance.
Analysis of binaural speech
• We can then plot the syllable onsets as a function of
frequency and time, and count them.
Reverberation forward
Note many syllables are detected (~30)
Reverberation backwards
Notice hardly ANY are detected (~2)
RASTI will give an identical value for both cases!!
How do we perceive distance and space?
• Reflected energy interferes with itself at the listener’s ears,
producing fluctuations in the sound pressure.
• We perceive fluctuations in level during a sound event and
up to 150ms after the end of the sound as a sense of
distance from the sound source.
• If the reflections are spatially diffuse (from all directions)
the fluctuations will be different in each ear.
– Fluctuations that occur during the sound event and within 50ms
after the end of the event produce both a sense of distance and the
perception of a space around the source.
• This is Early Spatial Impression (ESI)
• The listener is outside the space – and the sound is not enveloping
• But the sense of distance is natural and pleasant.
– Spatially diffuse reflections later than 50ms after the direct sound
produce a sense of space around the listener.
• This can be perceived as envelopment. (Umgebung)
The downside of Distance Perception
• Reflections during the sound event and up to 150ms after
it ends create the perception of distance
• But there is a price to pay:
– Reflections from 10-50ms do not impair intelligibility.
• The fluctuations they produce are perceived as an acoustic “halo” or
“air”around the original sound stream. (ESI)
– Reflections from 50-150ms contribute to the perception of distance
– but they degrade both timbre and intelligibility, producing the
perception of sonic MUD. (Mulmig,Glauque)
• The addition of mud to a speech or singing voice has
serious dramatic consequences
Distance and Drama: Copenhagen New Stage
We were asked to
improve speech
intelligibility in this
theater, specifically
for drama.
Using some
extraordinary
technology we
succeeded.
But we also increased
the sense of sonic
distance.
The theater directors
decided to fix the
intelligibility problems
by improving the
diction of the actors.
We completely
agreed!
Example of reflections in the 50-150ms range
Balloon burst in the
New Bolshoi.
Source was on the
forestage, and the
receiver was in the
stalls at row 10.
Note the HUGE burst
of energy about 50ms
after the direct sound.
The 1000Hz 0ctave
band shows the
combined reflections
to be 6dB stronger
than the direct sound.
The sound clip
shows the result of
this impulse
response on speech.
The result (in this case) is a decrease in intelligibility and an increase in distance
Human Perception – the background sound stream
• We perceive the background sound stream in the spaces
between the individual sound.
• The background stream is perceived as continuous, even though it
may be rapidly fluctuating.
• When masking by foreground sounds is low the background stream is
perceived at an absolute level, not as a ratio to the foreground sound.
– This is why playing a recording at a higher level cause the perceived
amount of reverberation to increase.
• Perception of the background stream is inhibited for 50ms after the
end of a sound event, and reaches full sensitivity only after 150ms.
Example of foreground/background
perception (as a cooledit mix)
Series of tone
bursts (with a
slight vibrato)
increasing in
level by 6dB
Reverberation
at constant
level
Mix with
direct
increasing 6dB
Result: backgound tone seems continuous and at constant level
Example of background loudness as
a function of Reverberation Time
Tone bursts at
constant level,
mixed with
reverberation
switching from 0.7s
RT to 2.0s RT, and
reducing in level
~8dB
Output – perceived
background is
constant! (But the
first half is
perceived as farther
away!)
Note the reverb level in the mix is the same at 150ms and greater. One gets the same
results with speech.
Summary: Perceptions relating to stream
separation
• First is the creation of the foreground stream itself. The major
perception is intelligibility
• Second is the formation of the background sound stream from sounds
which occur mostly 150ms after the direct sound ends. The perception
is reverberance
• Third is the perception of Early Spatial Impression (ESI) from
reflections arriving between 10-15ms and 50ms after the end of the
direct sound. The perception is of distance and acoustic space
around the source.
• Fourth is the timbre alteration and reduction of intelligibility due to
reflections from 50 to 150ms after the end of the direct sound event.
The perception is MUD and distance.
• Human hearing has been designed to suppress the perception of ESI
and of mud. As long as intelligibility is more or less satisfactory, after
an adaptation period we no longer hear these properties of the room.
– And we usually can not remember them.
– This does NOT mean they are dramatically or artistically unimportant!
Berlin
•
•
Synthetic Opera House Study
Dresden
We can use MC12 Logic 7 to separate the orchestra from the singers on
commercial recordings, and test different theories of balance and reverberation.
From Elektra – Barenboim. Balanceand reverb in original is OK by Barenboim.
Original Mix
Vocals
Downmix – with
reverb on the
orchestra, but not on
the singers
Reverb from orchestra
Reverb from singers
Downmix with reverb
on the singers. Note
the result is MUDDY!
Localization
• Localization is related to stream formation. It depends
strongly on the onset of sound events.
– IF the rise-time of the sound event is more rapid than the rise-time
of the reverberation:
– then during the rise time the IID (Interaural Intensity Difference)
and the ITD (Interaural Time Difference) are unaffected by
reflections.
• We can detect the direction of the sound source during this brief
interval.
• Once detected, the brain HOLDS the detected direction during the
reverberant part of the sound.
• And gives up the assigned direction very reluctantly.
– The conversion between IID and ITD and the perceived direction
is simple in natural hearing, but complex (and unnatural) when
sound is panned between two loudspeakers.
• Sound panning only works because localization detection is both
robust and resistant to change.
• A sound panned between two loudspeakers is profoundly unnatural.
Detection of lateral direction through
Interaural Cross Correlation (IACC)
Start with
binaurally recorded
speech from an
opera house,
approximately 10
meters from the
live source.
We can decompose
the waveform into
1/3 octave bands
and look at level
and IACC as a
function of
frequency and
time.
Level
( x = time in ms y=1/3 octave bands 640Hz to 4kHz) IACC
Notice that there is NO information in the IACC below 1000Hz!
Position determination by IACC
We can make a histogram of
the time offset between the
ears during periods of high
IACC.
For the segment of natural
speech in the previous slide, it
is clear that localization is
possible – but somewhat
difficult.
Position determination by IACC (continued)
Level displayed in 1/3 octave bands (640Hz to 4kHz)
IACC in 1/3 octave bands
We can duplicate the sound of the previous example by adding reverberation to dry
speech, and giving it a 5 sample time offset to localize it to the right.
As can be seen in the picture, the direct sound is stronger in the simulation than in the
original, and the IACCs - plotted as 10*log10(1-(1/IACC)) - are stronger.
Position determination by IACC (continued)
Histogram of the time
offset in samples for each
of the IACC peaks
detected, using the
synthetically constructed
speech signal in slide 2.
Not surprisingly, due to the higher direct sound level and the artificially
stable source the lateral direction of the synthetic example is extremely clear
and sharply defined.
Summary so far…
• Rank ordering opera houses or concert halls through the
memory of conductors is probably not very useful.
• When the sounds of a house can be compared rapidly
(through electronic enhancement or recording) there is
almost unanimous agreement on the best sound, and this
sound is highly articulate.
• The conductor will insist on some low-frequency
envelopment on the orchestra, as long as vocal clarity is
not compromised.
• Considerable experimentation has found that there is an
ideal reverberation profile for opera performances.
– This profile is based on the physiological properties of human
hearing
– And is thus the same profile as we need on a good recording.
The Ideal Reverberation above 1000Hz.
The ideal profile has three
distinct slopes.
1. Reflections in the 20ms to
50ms time range with a total
energy of -4dB to -6dB
relative to the direct sound
combine with the direct
sound to produce a decay
rate under 1 second RT.
2. Reflections in the 50ms to
150ms time range decay
much more gradually – with
a slope greater than 2
seconds RT.
3. Reflections after 150ms
Aside – this profile is a bit of a theoretical concept. produce our perception of
Measurement data in halls is sufficiently chaotic
reverberance, and should
and place dependent to prevent one from actually
decay at a rate appropriate to
observing a triple slope !
the music.
Most real rooms (at all frequencies) have
exponential decay
Exponential decay produces
a single-slope.
If the direct sound is strong
enough the effective early
decay can be short.
- But then there will be too
few early reflections and the
late reverberation will be
weak.
If the direct sound is weak,
there will be too much energy
between 50 and 150ms, and
the sound will be MUDDY.
The ideal reverberation profile is
frequency dependent
• For frequencies above 1kHz (speech) the ideal profile has
three distinct slopes
– 1. The early slope – consisting of the direct sound and the 0-50ms
reflections. This slope is steeply down – less than 1 sec RT.
– 2. The middle slope – 50 to 150ms – is relatively flat – can have
an RT of 3s or more. This flat section of the profile maximizes the
late reverberant level while minimizing the muddiness.
– 3. The slope of the decay beyond 150ms can be around 1.3
seconds RT for opera and up to 2 seconds RT for orchestra (if the
early slope is short enough to maintain clarity.)
• Below 500Hz the decay probably should be single sloped,
with RT of 1.7s or higher.
– This is because in our experience a single slope decay at low
frequencies produces the most pleasing sound on an orchestra.
• Thus in a hall with natural acoustics the reverberation time
and reverberation level should increase below 500Hz.
Theatro Alla Scala, Milan
Echograms from LaScala.
(From Beranek) illustrate these
profiles:
Top curve - 2kHz octave band,
0-200ms
At 2kHz note the high direct
sound and low level of
reflections in the 50-150ms
time range.
Bottom curve - 500Hz octave
band 0-200ms
Note the high reverberation
level – and short critical
distance.
Let’s listen to Alla Scala!
• Matlab can be used to read these printed impulse respones and convert
them into real impulse responses.
– 1. First we read the .bmp file from a scan, and convert the peaks in the file
to delta functions with identical time delay, and an amplitude equivalent to
the peak height.
• All the direct sound energy is combined into a single delta function, and the
level of the direct sound is normalized (relative to the rest of the decay), so the
2kHz and 500kHz impulses can be accurately combined.
– 2. We then apply a random variable ~+- 5ms to the delay time to correct
for the quantization in the scan.
– 3. We then extend the echogram to higher times by tacking on an
exponentially decaying segment of white noise, with a decay rate equal to
the published data for the hall.
– 4. We then filter the result for the 2kHz echogram with a 1k high-pass
filter, and combine it with the 500Hz echogram low-pass filtered at 1kHz.
– 5. If desired we can create a “right channel” and a “left channel”
reverberation by using a different set of random variables in steps 2 and 3.
– 6. We convolve a segment of dry sound with the new
– The result is sonically quite convincing!
Alla Scala at 500Hz – reading the plot
Top curve – 500Hz measured
impulse response as given by
Beranek. JASA Vol. 107 #1,
Jan 2000, pp 356-367
Bottom curve – impulse
response as regenerated from
delta functions, passed through
a 2kHz 6th order 1 octave filter.
Note the correspondence is
more than plausable.
Alla Scala 500Hz – randomizing and extending
Top graph: Alla Scala published data
Bottom graph: regenerated impulse
response after randomization and
extention.
Listen to Alla Scala, NNT Tokyo, Semperoper
2kHz
500Hz
2kHz and 500Hz
Impulse
responses from
Scala Milan
NNT Theater
Tokyo
Semper Oper
Dresden
(All data from
Beranek)
Original Sound
How can we make a room ideal for opera?
• A conventional opera house can be made to approach the
sonic ideal by MAXIMIZING the reverb radius for the
soloists, for frequencies above 700Hz.
– This involves arranging the audience and reflectors around the
stage to direct the sound of the singers directly into the audience.
– These architectural features increase the very early energy while
decreasing the sound power available to the middle and late
reverberation.
– At the same time, we should try to maximize the reverberation
time below 500Hz.
• To some degree, the success of a design can be seen
immediately in a picture taken from the stage.
– We need only notice how much absorption we see in front of us.
The more absorption and less bare wall we see, the higher the
clarity.
Pictures from the stage
Deutsche Oper – might as well tear it down.
New Bolshoi – just add curtains on the back wall.
Deutsche Staatsoper – vital,
exciting, and alive – with or without
the LARES.
Compromises
• The fight between those who like clarity and those who
like reverberance is relatively recent.
– Reveberance currently has the upper hand.
– One of the purposes of this talk is to suggest that the emphasis on
reverberance is misguided.
– In every case where the author has worked closely with a music
director, the director has wanted a more reverberant sound. “like
the Semperoper”
• However, when given the opportunity to hear what “Semperoper”
reverberation actually sounds like, the director invariably prefers a
much less reverberant sound.
• In fact, it is my observation that the difference between the
reverberance the conductor wants, and the natural reveberance of a
dry opera house is extremely subtle.
• In a controlled test at the Royal Theater in Copenhagen (set up by
Anders Gade) 80% of the test subjects could hear no difference at all.
– In every case where we have had the opportunity to increase
clarity, or improve the balance between the singers and the
orchestra, the improvement has been noticed immediately, and
appreciated, by everyone, including the conductor.
Ideal sound through electronics
• Electronic enhancement has the potential to
create ideal opera acoustics
– But only if the system is capable of creating a
triple-slope decay at high frequencies, and a
single-slope decay at low frequencies.
– This combination is not common with currently
available systems!
Acoustic Feedback – bane or boon?
• All enhancement systems have significant feedback between the
loudspeakers and the microphones.
– A single slope decay with an RT of 1.7 seconds MUST create a
reverberation radius which is relatively small – usually under four meters
in a typical opera house.
– If the pickup microphones are separated from each sound source by more
than this distance, they MUST pick up more reverberation than direct
sound.
• Current enhancement systems divide into two types:
– Those that utilize the acoustic feedback to increase the reverberation time
directly.
• Philips MCR
• Carmen
– And those that include a reverberation device in the electronics, and
couple this device electronically to the hall.
• Lares
• Paoletti (Stagetec)
• ACS, SIAP
• Only the second type are capable of creating a dual or triple-slope
decay
Feedback and coloration
• Any time there is significant acoustic feedback there will
be coloration.
– Acoustic feedback paths have complex frequency response, and
this response is audible.
• This coloration must be minimized in a successful design.
– There are no easy solutions. Almost all systems start with a
multichannel design.
• With many channels the individual response variations in each
channel tend to average out.
• But each channel must have its own microphone and speaker, and all
devices must be separated physically by the reverberation radius.
– This physical separation is tricky to realize in practice.
– Alas, most available systems minimize the amount of coloration by
minimizing the system gain.
• Most available systems are not capable of doing very much at all.
• This is sometimes an advantage, as Eckhard will tell.
– Some available systems minimize the coloration by denying that
feedback exists (ACS, and to some degree SIAP)
Lares System
• Lares uses a multichannel concept
– But it uses an electronic trick to allow a single
pair of microphones to drive a large number of
output channels (typically four or eight)
– As a result it becomes practical to place the
microphones close to the performers.
• The result is a cleaner pickup. The pickup
microphones contain less coloration and
reverberation.
• The energy content in the 50 to 150ms time range
can be minimized this way (and only this way).
Lares Block Diagram
A typical Lares installation includes
two pickup microphones and eight
separate output channels.
Each microphone is connected to each
output channel through a separate,
independently time varying
reverberation device.
The frequency dependence of the
reverberant level, and the frequency
dependence of the reverberation time
can be separately adjusted.
Lares also includes a noise generator
and 1/3 octave analyzer for setting and
verifying the overall system gain.
Lares is highly resistant to coloration
• This is achieved through the multichannel design,
and the independent time variance.
• The type of time variance used minimizes the
pitch-shift, which is not audible when the system
is correctly adjusted.
• As a result a high reverberant level can be
achieved, even when the pickup microphones are
far from the sound sources.
– And this is sometimes a problem. Customers turn the
system up too high, or insist on placing the
microphones too far away.
– The result can be both muddiness and excessive
coloration (at least to my ears.)
– There are way too many existing Lares installations that
have these problems!
Demonstrations of Lares
Exponential Decay
• Sabine’s breakthrough
– Extensively studied by Morse, Beranek, Eyring, etc.
– In rooms where the absorption is relatively uniformly distributed
the decay of sound follows a straight line when plotted
logarithmically.
– When the decay is exponential we can precisely predict the ratio
between the direct sound and the reflected sound in the 50-150ms
time range.
• For computing sonic distance the direct sound may be
augmented by reflections that arrive before 50ms.
– At very short reverberation times the reflected energy is
concentrated into times less than 50ms after the direct sound, and
perceived distance is low, regardless of the direct/reflected ratio.
– Moderate reverberation times (1.2 – 1.6 seconds) concentrate the
energy between 50 and 150ms. Halls with these reverberation
times can easily sound muddy. (mulmig or glauque)
Acoustic research through synthesis
• We do not need to use reflections to generate the
perception of acoustics!
– It is the total reflected energy in different time bands that matters,
along with the spatial and frequency distribution of that energy.
– We can synthesize reverberation by convolving an input signal
with an impulse response sculpted from noise.
• This technique allows to investigate the effects of different
energy profiles.
– I decided to convolve four identically shaped noise bursts, each
46ms long, with a segment of the Rake’s Progress.
– These segments can be then strung together with different delays
and amplitudes to form an arbitrary reverberation.
• For example, let’s synthesize an exponential decay of 1.4
seconds RT, with a variable direct/reverberant ratio:
Synthetic impulse response
linear amplitude scale
log amplitude scale
Synthetic impulse response from noise 1.4s exponential decay
This is the “sound” of a one sample click at 22050 samples/sec.
This is NOT music or speech.
Window averaging, direct/reverb = 0dB
25ms averaging window
100ms averaging window
We can average the impulse response over a selected time period. Mathematically this is the
same as the average response of the system to an input signal (phone or note) with a duration of
the averaging period. The first window represents the response of the room to a 25ms sound, and
the second to a 100ms sound.
Note the EDT we perceive is HIGHLY dependent on the length of the note!
Schroeder Integration, direct/reverb = 0dB
Schroeder Integration –
reverse integration –
represents the response of the
room to a note of infinite
duration.
Jordan’s method of
determining EDT takes some
account of the strength of the
direct sound.
Schroeder’s method for EDT
completely ignores the
strength of the direct sound.
Neither method is likely to
predict the response of the
room to speech or normal
music.
Window Averaging, direct/reverb = -3dB
25ms Averaging Window
100ms Averaging Window
For a 25ms sound the effective reverberation time is 0.9 seconds, so at least these sounds are
heard with high articulation. 100ms sounds on the other hand, are smoothed to nearly the same
slope as the late reverberation time
Schroeder Integration, direct/reverb =-3dB
Very long notes still show some dual-slope decay. Jordan’s method for EDT
is sensitive to this difference, Schroeder’s is not.
Examples
• See surround encoded DTS exponential
decay
Non-exponential decay direct/reverb = -3dB
It is interesting to ask what happens when there is a high burst of very early reflections,
followed by a relatively level energy curve out to beyond 160ms.
This type of decay minimizes sonic distance, while maintaining reverberance and envelopment
Non-exponential decay direct/reverb = -3dB
% amplitudes of the different time periods in dB
% all dB values correspond to the energy content of the mix
d1 = -1.7; % direct sound
l1 = -1.7; % 20ms-60ms
l2 = -8.5; %60ms-100ms
l3 = -8.5; %100ms-140ms
l4 = -8.5; %140ms-180ms
l5 = -8.5; %180ms-220ms
l6 = -10.2; %220ms-260ms
l7 = -11.9; %260ms-300ms
l8 = -13.6; %300ms-340ms
This is the MATLAB code that sets up the non-linear reverberation. Note that for
this example, the early reflections have equal energy to the direct sound.
Sonically, it is much better if the early energy is –4dB to –6dB relative to direct.
Non-exponential decay direct/reverb = -3dB
25 ms averaging window
100ms averaging window
With this non-linear decay both 25ms sounds and 100ms sounds are perceived with high
articulation. Longer notes and sounds also have high reverberance.
Once again, it would be sonically more pleasant if the early reflections were reduced.
Examples
• See surround encoded DTS non-linear
decay
Frequency Dependence
• We have so far been studying broadband reverberation.
• However human perception is highly frequency dependent.
• As a consequence, our perceptions of intelligibility,
articulation, loudness, and sonic distance are primarily
influenced by frequencies above 700Hz.
• However the perception of reverberance, warmth, and
envelopment primarily arise from frequencies below
500Hz.
• It is possible to have both high clarity and high
envelopment at the same time by carefully controlling the
frequency dependence of the reflected energy.
The frequency transmission of the pinnae and
middle ear
From: B. C. J. Moore, B. R.
Glasberg and T. Baer, “A model
for the prediction of thresholds,
loudness and partial loudness,” J.
Audio Eng. Soc., vol. 45, pp.
224-240 (1997).
The intensity of nerve firings is concentrated in the frequency range of human
speech signals, about 700Hz to 4kHz. With a broad-band source, the ITD and IID
at these frequencies will dominate the apparent direction.
Boston Symphony Hall, occupied, stage
to front of balcony, 1000Hz
Boston Symphony Hall, occupied, stage
to front of balcony, 250Hz
Adelade - Festival Center Theater
Conclusions
• There is an ideal acoustic profile for opera performance.
• This profile may or may not be achievable through
conventional acoustics.
– Our goal is not ideal acoustics, it is ideal SOUND.
• When restricting the design to conventional acoustics, the
optimal sound as determined by a rapid A/B test is less
reverberant than most conductors think they want in the
absence of an A/B test, at least above 700Hz.
• An optimal design will maximize the reverb radius above
700Hz, aiming for a strongly dual-slope decay as measured
by the decay time to –6dB of a 50ms to 100ms sound.
– This goal is best achieved by directing the direct sound (and first
reflections) from the soloists into the audience.
• The optimal design will maximize the reverberation time
and the reverberant level below 500Hz.
• Given the choice between high clarity and a compromise
that reduces clarity somewhat in favor of more
reverberance for the orchestra, CHOOSE CLARITY!