Spatial Sound Localization for Motion Planning

Download Report

Transcript Spatial Sound Localization for Motion Planning

Sound 101: What is it, Why is
it, Where is it?
Nikunj Raghuvanshi
University of North Carolina at Chapel Hill
What is Sound?
 Waves, Particles? If waves, in what medium?
 Not an obvious answer in the 19th century
 Interesting read – A short history of bad acoustics,
M.C.M. Wright, The Journal of the Acoustical Society of America
(JASA), 2006
 Now we do know the answer: Waves in air
University of North Carolina at Chapel Hill
Waves of what?
 Air pressure
 Compressions and Rarefactions
Compression
Rarefaction
Wavelength (λ)
University of North Carolina at Chapel Hill
How fast does sound travel?
 Newton did it first (As everything else)
 But, he made a mistake (Not a cyborg, after all)
 Laplace corrected that
 Accepted value today: c = 343 m/s (~770 m.p.h) at
room temperature
 Compare this to light’s 300,000,000 m/s. This has
very interesting consequences
University of North Carolina at Chapel Hill
Frequency
 What is frequency?
 Given the wavelength, λ and the speed, c can you find
the frequency, ν?
c = νλ
 Humans can hear frequencies from 20 to 20,000 Hz
Trivia: In music, the frequency doubles every octave
 Range of wavelengths? (Use above formula)
University of North Carolina at Chapel Hill
Phase
pressure
Wavelength()
p/2
Phase(q)
p
2p
Distance
3p/2
 Phase (q): Measures the progression of pressure at a
point between a crest and a trough.
University of North Carolina at Chapel Hill
Loudness
 What range of sound amplitudes (pressure) can we
hear?
 A huge, huge range (100,000,000 pressure levels)
The human ear is an amazing organ
 Loudness measured in log scale (deciBels),
Loudness, dB = 20 log(p/p0)
p0 is the threshold of hearing
University of North Carolina at Chapel Hill
Loudness
Pressure
Loudness
# of Times Greater
Than TOH
1*10-6
0 dB
100
1*10-5
20 dB
101
1*10-3
60 dB
103
Busy Street Traffic
1*10-2.5
70 dB
103.5
Vacuum Cleaner
1*10-2
80 dB
104
Large Orchestra
6.3*10-1.5
98 dB
104.9
Front Rows of Rock
Concert
1*10-0.5
110 dB
105.5
Threshold of Pain
1*100.5
130 dB
106.5
Military Jet Takeoff
1*101
140 dB
107
Instant Perforation of
Eardrum
1*102
160 dB
108
Source
Threshold of
Hearing (TOH)
Whisper
Normal
Conversation
Source: http://www.glenbrook.k12.il.us/gbssci/Phys/Class/sound/u11l2b.html
University of North Carolina at Chapel Hill
Why is sound produced?
 A vibrating surface creates pressure fluctuations
 Pressure waves are sensed by ear as sound
 Pressure fluctuation  surface velocity
Vibration
Pressure Wave
The University of North Carolina at Chapel Hill
Perception
Modeling surface vibration
d 2r
dr
M 2  ( M   K )
 Kr  F (t )
dt
dt
Inertia
Damping
Elasticity Force
The University of North Carolina at Chapel Hill
Where does Sound go?
 All waves travel in much the same way (Ripples in a
pond, sound, light, seismic waves etc.)
 So how’s sound different?
 Coherent (Interference)
 Wavelength (Diffraction)
 Speed (Transient phenomena observable)
Interference
 The resultant pressure at P due
to two waves is simply their
sum
 Phase is crucial
out of phase: cancel
A
P
in phase: add
B
signal A
signal B
A+B
University of North Carolina at Chapel Hill
Diffraction
 A wave bends around
obstacles of size approx.
its wavelength, i.e. when

~s
s
 P will have appreciable
reception only if there is a
good amount of diffraction
 This is the reason sound
gets everywhere
University of North Carolina at Chapel Hill

P
s
Overview
Background on Sound
Sound localization in humans
Sound localization for robots
Results
University of North Carolina at Chapel Hill
Before we start…
 This is a different connotation of “localization” than the
one used in motion planning
 Sound localization is much easier if the number of
sound sensors is large, by measuring the inter-arrival
time difference between neighboring sensors
 There have been numerous such approaches
 However, the localization performance of humans
clearly shows that just two ears are sufficient
 The work I discuss is the first one to effectively use just
two sensors to accurately find the direction to the sound
source
University of North Carolina at Chapel Hill
Sound Localization
The sound localization facility at Wright Patterson Air Force
Base in Dayton, Ohio, is a geodesic sphere, nearly 5 m in
diameter, housing an array of 277 loudspeakers. Listeners in
localization experiments indicate perceived source directions by
placing an electromagnetic stylus on a small globe.
University of North Carolina at Chapel Hill
Sound Localization: ILD
 Idea: A sound source on
the right will be
perceived to have more
intensity at the right ear
 Head casts an
acoustical or sound
shadow
 The difference of the
intensities at the two
ears is the Interaural
Level Difference (ILD)
University of North Carolina at Chapel Hill
Sound Localization: ILD
 The ILD depends on the angle
as well as frequency
 Different frequencies diffract
differently
 In general, higher frequencies
diffract less, leading to a
sharper shadow and higher
ILD
 Assume head has dia ~ 17 cm
 ILD becomes useless for
f<500 Hz (=69 cm)
 Accurate for f>3000 Hz
University of North Carolina at Chapel Hill
Sound Localization: ITD
 Idea: Sound has longer
path for farther ear (d),
and hence takes more
time to reach it
 This too depends on
both the angle and
frequency of sound
 Measured as the
Interaural Time
Difference (ITD)
d
University of North Carolina at Chapel Hill
ITD: Range of usefulness
 If the signal is periodic
(eg. Pure tone), ITD is
useless if the path
difference is much
greater than the
wavelength
 For human head size,
ITD is useful for f<1000
Hz
a). Peak 1 arrives properly in
sequence at the two ears and there’s
no confusion.
b). Peak 1 and 2 arrive closely at the
ears and cause confusion
University of North Carolina at Chapel Hill
Finding the ITD
 Use a pattern matcher to check position of MAXIMUM
similarity
 Independent sound signals g(t) & h(t) are ‘slid’ across
each other (Sliding Window)
 Correlation vector is returned showing delay between the
signals g(t) & h(t) i.e. the ITD
University of North Carolina at Chapel Hill
Front-back ambiguity
 The theory of humans using only ITD and ILD has a big
hole. The formulation has inherent symmetry which
creates front-back ambiguity (points 2 and 3 in figure)
 ITD and ILD for 2 and 3 will be identical (right?)
University of North Carolina at Chapel Hill
Front-back ambiguity
There is a simple way to break this
symmetry: move the head!
This approach is used in the paper I
discuss later
Interestingly, a moving source alone may
not be enough to break the ambiguity, its
important to move the head
But humans can do it without even
moving, how?
University of North Carolina at Chapel Hill
The HRTF
 There is no symmetry in reality
because of the structure of the
external ear and scattering by
the shoulders and head
 The Head Related Transfer
Function (HRTF) measures
the amounts by which different
frequencies are amplified by
the head for different source
positions
 This thing works well only
when the sound is broad-band
University of North Carolina at Chapel Hill
Summary
 Sound provides two cues: ILD and ITD
 ILD measures the intensity difference between
the two ears at a given point in time
 ITD measures the difference in arrival time for
the same sound at the two ears
 ILD is useful for frequencies >3000 Hz
 ITD is useful for frequencies <1000 Hz
 There is a front-back ambiguity using ITD and
ILD alone which head motion resolves
University of North Carolina at Chapel Hill
Overview
Background on Sound
Sound localization in humans
Sound localization for robots
Results
University of North Carolina at Chapel Hill
Sound Localization for robots
The papers I will discuss:
A Biomimetic Apparatus for Sound-source
Localization. Amir A. Handzel, Sean B. Andersson,
Martha Gebremichael and P.S. Krishnaprasad. IEEE
CDC 2003
Robot Phonotaxis with Dynamic Sound-source
Localization. Sean B. Andersson, Amir A. Handzel,
Vinay Shah, and P.S. Krishnaprasad. IEEE ICRA
2004
University of North Carolina at Chapel Hill
Sound Localization
 As discussed, to resolve front-back
ambiguity, we have two options:
Use a spherical head, and use
head motion to resolve front-back
ambiguity
Use an asymmetric head and
compute the HRTF and use that,
like humans
 The first approach is much simpler
and is the one used in this paper
University of North Carolina at Chapel Hill
The “head”
Sound Localization
Start
End
University of North Carolina at Chapel Hill
A simple ITD-based method
 A much simpler method commonly
in use
 Consider a distant source so that
impinging wave is nearly planar
 Path difference between left and
right is given by l(ABC), which is,
 By correlating the left and right
sound signal, suppose the ITD is
found, then a = c*ITD
 Solve for q using above equation
University of North Carolina at Chapel Hill
The IPD-ILD algorithm
 Solve for scattering from a hard
spherical head. This is a more
realistic physical model
 Two microphones at the poles
(
)
 Wave equation is given by,
 Where c=344 m/s is the speed
of sound, is the velocity
potential and
is the laplacian
University of North Carolina at Chapel Hill
Mathematical Formulation
 Basic idea for solution: Solve in spherical coordinates. The
solution is well known, using separation of variables
 The only place where scattering from a hard sphere is
invoked is to satisfy the following equation:
 In the above,
and
are the incident potential (from
source) and scattered potential (from sphere) respectively
 The solution has the following important properties:
 Dependent only on the angle between source and receiver
 Independent of source distance: can localize only the direction
University of North Carolina at Chapel Hill
Mathematical Formulation
 It is assumed that the sound source, the center of the
head and the ears are in the same plane, i.e. localization
is performed only in the horizontal plane
 The pressure p, measured at a microphone is
given by:
 In the above,  is the geometry and frequencydependent phase-shift, and  is the angular frequency
(   2p f )
 Its important to note that both A and  depend on the
frequency,  , due to differential scattering
University of North Carolina at Chapel Hill
The IPD and ILD
 The Interaural Phase Difference (IPD) is the same
concept as the ITD, except it measures the phase
difference rather than the time difference. Specifically,
IPD  ITD * 
 The IPD and ILD can be computed as,
ILD  log AL  log AR
IPD   L   R
 At given source angle q , using these theoretical
formulas, we may calculate IPD() and ILD()
 Our job is to invert this operation, given the IPD and ILD
at different frequencies, we need to find q
University of North Carolina at Chapel Hill
Localization Metric
 Sample and store the values of IPD(  , q ) in a table
 Collect data from microphones and try to find closest
theoretical curve
 Apply FFT to gather ILD and IPD values for different 
 Distance metric: L2 norm distance between predicted and
observed IPD and ILD curves
 Final distance,
 Minimize over q , to get source direction
University of North Carolina at Chapel Hill
Resolving front-back ambiguity
 Even though IPD and ILD are the same for any two
angles q and p  q , their derivatives with respect to q ,
IPD’ and ILD’ are not
 Since IPD and ILD are theoretically known, their
derivatives may be calculated, sampled and stored just
like the IPD and ILD values
 The observed difference between the IPD values for two
consecutive samples provides an approximation for IPD’
 Define a similar L2-norm metric for IPD’ and ILD’
 Augmented distance function to minimize:
University of North Carolina at Chapel Hill
Overview
Background on Sound
Sound localization in humans
Sound localization for robots
Results
University of North Carolina at Chapel Hill
Results: Accuracy of theoretical ILD
 Curve: Theoretically computed ILD
 Dots: Actual values measured from microphones
University of North Carolina at Chapel Hill
Results: Accuracy of theoretical IPD
 Much more accurate than ILD
University of North Carolina at Chapel Hill
Localization Performance
 Sharp minima at small angles, not so sharp at large angles
University of North Carolina at Chapel Hill
Localization Performance
IPD/ILD Algorithm
Simple ITD-based algorithm
University of North Carolina at Chapel Hill
Front-back ambiguity resolution
Symmetric
Without ambiguity resolution
With ambiguity resolution
University of North Carolina at Chapel Hill
Conclusion/Discussion
 IPD/ITD is a much stronger clue than ILD. That’s why the
simple ITD algorithm also gives decent performance
 Overall they are the first ones to demonstrate a real
working robot with good sound localization, so
presumably this works well in practice
 The method is theoretically well-motivated, and shows
that good localization can be achieved with just isotropic
microphones
 They also claim that it works well in a laboratory
environment with some noise (CPU fans etc.) and
reflections from the walls etc.
University of North Carolina at Chapel Hill
Video
University of North Carolina at Chapel Hill
Thanks
Questions?
University of North Carolina at Chapel Hill
Summary
Reflective environments, the precedence
effect
University of North Carolina at Chapel Hill
Longitudinal vs. Transverse Waves
 Sound is a longitudinal wave, meaning that the motion of
particles is along the direction of propagation
 Transverse waves—water waves, light—have things
moving perpendicular to the direction of propagation
University of North Carolina at Chapel Hill