Transcript Slide 1
Speech
Speech Recognition Front End
This week’s focus is on the spectral analysis Pre-emphasis windowing Spectral Analysis Enhance Features Temporal Features Frequency Features Consolidate Features Feature Vectors
Spectral Analysis
This week’s emphasis will be on Fourier Analysis • • • Goal: Find useful frequency related features Approaches –
Without Fourier Analysis:
• Apply a recursive band pass bank of filters • Use linear predictive coding (LPC) –
With Fourier Analysis:
• Calculate a Fourier transform • warp results based on the MEL scale Applications: Auditory models mimicking human hearing – Eliminate noise by removing non-voice frequencies – Detect formants present in signal – Perform Cepstral analysis to detect pitch and recognize speech – Auditory nerves stop responding to extended occurrences of the same frequency • Idea: Deemphasize frequencies present for extended periods.
• Results: Effective for speech recognition in noisy environments
The Fourier Transform Family
• • • • •
Fourier Series
A decomposed weighted sum of sinusoidal functions that models an arbitrary infinitely periodic continuous function
Fourier Transform
A linear operation that maps an arbitrary function with infinite range into a spectrum of its frequency components
Discrete Fourier Transform (DFT)
A Fourier Transform applied to a discrete infinitely repeating periodic series of complex numbers.
Discrete Time Fourier Transform (DTFT)
A Fourier Transform applied to a a-periodic discrete series of complex numbers that extend from ± ∞.
Fast Fourier Transform: Fast way to calculate DFT
The number e
• • • •
e = lim
n->∞
{(1 + 1/n)
n
}
When n = 1 e ≈ 2 When n = 2 e ≈ (1 + ½) 2 = 9/4 = 2.25
When n = 3 e ≈ (1 + 1/3) 3 = 64/27 = 2.37037
When n is extremely large, it approaches the value: e = 2.718281828 … What does this have to do with sound?
Answer: The future slides will tell.
Quick Calculus Review
• • • • The derivative of a function at a point is the slope of the function at that point (change in y over change in x).
The derivative of x 2 = 2x (Notation: f’(x 2 ) = 2x) lim ∆x->0 ( (x+∆x) = lim ∆x->0 (x 2 2 – x 2 )/ ∆x + 2x∆x + ∆x 2 – x 2 )/∆x = lim ∆x->0 (2x + ∆x) = 2x Tables of derivatives proved by mathematicians exist We will need these: – – f’(x n ) = nx n-1 f’(Sin x) = Cos x, f’(Cos x) = -Sin x – f’(e x ) = e x , f’(e ax ) = a e ax
Complex Numbers • •
Extends the number line to a plane – – Horizontal axis: Real Numbers Vertical axis: Complex Numbers – Rectangular Notation: a + bi • a along the real axis • b along the imaginary axis Operations – Addition: (a+bi) + (c+di) = (a+b) + (b+d)I – – Multiplication: (a+bi) * (c+di) = (ac – bd) + (ad + bc)I Division: (a+bi)/(c+di) solved by multiplying numerator and denominator by the conjugate of c+di, which equals c-di
Polar Notation
Distance and angle from the origin
• • • Rectangular Form 4+i3 Convert to Polar Form (5,36.87) – – M = sqrt(4 2 +3 2 ) = 5 Ө = arctan(3/4) Convert to Rectangular – A+ib = M(cos Ө + i * sinӨ)
Note:
At 90 and 270 degrees we have a divide by zero
McLauren Series for e, sin, cos • • •
• McLauren Series to estimate any well-behaved function in terms of polynomials f(x) = f(0)x 0 /0! + f’(0)x 1 /1! + … + f n (0)x n /n! + … Try it out say for the third derivative at x = 0 f 3 (0) = 0 + 0 + 0 + 3*2*1 f 3 (0)/(3*2*1) + 0 + 0 + … All the derivatives match at x = 0.
Series that we will need • e x = 1 + x + x 2 /2! + x 3 /3! + x 4 /4! + … • Sin x = x – x 3 /3! + x 5 /5! – x 7 /7! + … • Cos x = 1 – x 2 /2! + x 4 /4! – x 6 /6! + …
Note: 0! = 1
Another way to calculate e: e = 1 + 1 + 1/2! + 1/3! + …
Sine, Cosine and e
From Previous Slide
e x = 1 + x + x 2 /2! + x 3 /3! + x 4 /4! + … Sin x = x – x 3 /3! + x 5 /5! – x 7 /7! + … Cos x = 1 – x 2 /2! + x 4 /4! – x 6 /6! + … e i Ө = 1 + i Ө + (iӨ) 2 /2! + (i Ө) 3 /3! + (i Ө) 4 /4! + (i Ө) 5 /5! + (i Ө) 6 /6! + (i Ө) 7 /7! + ··· (Multiply terms to eliminate higher powers of i) = 1 + i Ө - Ө 2 /2! - i Ө 3 /3! + Ө 4 /4! + i Ө 5 /5! (Gather real and complex terms together) Ө 6 /6! - i Ө 7 /7! + ··· = (1 ···) Ө2/2! + Ө 4 /4! Ө 6 /6! + ···) + i (Ө - Ө 3 /3! + i Ө 5 /5! - i Ө 7 /7! + (Substitute Cos and Sin terms for the series) e i Ө = cos( Ө) + i sin(Ө) (This is called Euler’s formula )
Key Formulae and Identities
Euler's Formula:
e
ix
= cos(x) + i * sin(x)
Trigonometric Identities:
cos(x)=cos(-x) and sin(x)=-sin(-x) cos(x) = (e ix + e -ix )/2 and sin(x) = (e ix – e -ix )/2i sin 2 (x)+ cos 2 (x) = 1 sin(x+y) = sin(x)cos(y) + cos(x)sin(y) cos(x+y) = cos(x)cos(y) - sin(x)sin(y)
Quick Linear Algebra Review
• • • • • • Linear algebra extends Euclidian space beyond three dimensions.
<3,4,5> represents a vector going from points (0,0,0) to (3,4,5).
Two vectors are orthogonal (perpendicular roughly speaking) if their inner ( dot product ) equals 0.
– – Example: <1,0,0> • <0,1,0> = 1*0 + 0*1 + 0*0 = 0 Example: <3,1>•<-1,3> = 3*-1 + 1*3 = 0 Two functions are orthogonal between a and b if ∫ a,b f(x)g(x)dx = 0 A set of functions are mutually orthogonal if ∫ a,b f i (x)f j (x)dx = 0 if i≠j and c>0 if i=j.
Why do we need this?
Orthogonal function sets can be used to decompose or construct signals .
Inner Product:
sum the products of correspondent coordinates
Basis to span a space
• • • •
Consider the orthogonal basis <1,0,0>, <0,1,0>, <0,0,1>
– These form a basis a three dimension space.
–
Why?
Any 3-dimension vector is a linear combination of these – Example: <4,3,2> = 4 * <1,0,0> + 3 * <0,1,0> + 2 * <0,0,1>
Consider the orthogonal basis vectors: <1,2>, <-2,1>
– They are orthogonal because: <1,2> • <-2,1> = 0
Consider the basis vectors: <1/5 ½ ,2/5 ½ >, <-2/5 ½ ,1/5 ½ >
– Also orthogonal because the inner (dot) product is 0) – <1/5 ½ ,2/5 ½ >has a length of unity ((1/5 ½ ) 2 + (2/5 ½ ) 2 ) ½ = 1 – <-2/5 ½ ,1/5 ½ > also has a length of unity (same distance calculation)
Orthonormal
basis vectors: orthogonal and have unity length
Orthogonal and Orthonormal
• • • • Experiment (intuitive example, not mathematically precise)
Goal: construct <4,7> from basis vectors
– –
Orthogonal Basis
: <1,2> and <-2,1> <1,2> • <4,7> = 18 and <-2,1> • <4,7> = -1 – 18 <1,2> + (-1)<-2,1> = <20, 35> which is five times <4,7>
Another experiment
– – –
Orthonormal basis
: <1/5
½ ,2/5 ½ >, <-2/5 ½ ,1/5 ½ >
<1/5 ½ ,2/5 ½ > • <4,7> = 18/5 ½ and <-2/5 ½ ,1/5 ½ >• <4,7> = -1/5 ½ (18/5 ½ )<1/5 ½ ,2/5 ½ > + (-1/5 ½ )<-2/5 ½ ,1/5 ½ > = <20/5, 35/5> = <4,7> Conclusion: Orthonormal basis vectors correlated with another vector gets the multiple of that basis vector.
Fourier Series
• A Fourier series is an sum (possibly by not necessarily infinite) of Sine and Cosines to model a continuous signal. • Fourier modeling allows us to decompose a signal, perform processing, and recombine the results to solve an original problem
Fourier Decomposition
The top signal decomposes into nine cosine and sine waves
Fourier Square Wave Synthesis
Fourier Cosine Series
Integral: Cos ( πx/3) * cos (2 πx/3) cos ( πx/3) and cos (2 πx/3)
• The set of functions: {cos(k2πF 0 } where k is an integer >0 – Mutually orthogonal from –T to T for 0 ≤ t < ∞; T>0 – ∫ -L,L cos(k 1 2 πx/P) cos(k 2 2 πx/P)dx = 0 if k 1 ≠ k 2 ; ≠ 0 if k 1 – Proof requires some Calculus: Namely integration = k 2 • • x(t) = a 0 cos(0*2 πF 0 t) +a 1 cos(1*2 πF 0 t) +a 1 cos(2*2 πF 0 t) … x(t) = a0 + ∑ k=1,∞ a k cos(k2 πFt ) where F = 2π/T
Comment:
The series doesn’t include phases, if we add phases we have twice as many unknowns to compute
• •
A General Orthogonal Function Set
Euler Equation: e iφ – = cos(φ) + i sin(φ) Radius = magnitude (always unity); φ = phase.
Consider the function set: {e iω k } – Angular frequency: ω k = 2πkF 0 = 2πk/T 0 – F 0 ,T 0 Fundamental frequency & period.
– k = speed which e iω k traverses the circle –
Orthogonal because
∫ -∞,∞ e jω n e jω m =0 whenever n ≠-m 1.
Notes
The book uses j instead of I 2.
3.
4.
5.
Electrical engineers prefer j Mathematicians prefer I Get used to both!
In the diagram, φ = 2πF 0
• •
Orthogonality Example
Left: Correlate top with middle resulting bottom having area ≠0 Right: Correlate top with middle resulting bottom having area = 0
• • • •
Putting it all together
{e iω k } is an Orthogonal basis for signals – Each function: e iω k is a basis function – We can use to basis functions to synthesize signals Synthesize (Fourier series) – Source: frequency magnitudes, Sink: time signal – – x(k) = (1/T)∑ k=0,T a k e ikω 0 where x(k) = signal at time t T = # of basis functions (possibly infinite); a k = magnitude of w k For computer processing, we need a discrete counterpart – Why? We don’t to deal with infinite points or basis functions – – x[k] = (1/N)∑ k=0,N X[k] e i2∏kn/N k determines how fast the sum traverses the circle (higher k faster) – N basis functions and N frequencies Note: For periodic functions, we can use [0,T] instead of [- ∞,∞]
• • • • • • • •
Fourier Analysis
Goal: Compute coefficients given the signal.
Synthesis equation: x(t) = ∑ k= -∞,∞ Multiply both sides by e -itkw 0 x(t)e -itkw 0 = (∑ k= -∞, ∞ a k e imtω 0 )e -itkw 0 a k e itmω 0 Integrate over the period: 0, T0 ∫ 0,T0 x(t)e -itkw 0 dt = ∫ 0,T0 (∑ k= -∞, ∞ a k e imω 0 t ) e -itkω 0 dt ∫ ∫ The sum will be zero except when k = m 0,T0 0,T0 x(t)e -itkw 0 dt = (∑ k= -∞, ∞ x(t)e -itkw 0 dt = (∑ k= -∞, ∞ a a k k ) ∫ ) ∫ 0,T0 0,T0 (e (e timω 0 ) e it(m-k)ω 0 -itkω )dt 0 dt The only time this is non-zero is if k=m ∫ 0,T0 x(t)e -itkw 0 dt = a k ∫ 0,T0 dt = a k t | 0,t0 = a k T0 The answer (value of coefficient m): a k = (1/T0)∫ 0,T0 x(t)e -itkw 0 dt Note: 1/T0 is simply a constant the scales the result
Discrete Version
• • • • • Definition: Continuous Fourier Transform and Inverse – – Transform: X(w) = ∫ -∞, ∞ x(t)e -itwt dt Inverse: x(t) = (1/2π)∫ -∞, ∞ X(w)e iwt dw Convert from continuous version: – – Evaluate at N equally spaced points (period now is N) Use sums to approximate the integral – Note: x(t) = value at time t, x[n] is x(t) evaluated at time 2 ∏ kn/N Discrete Fourier Transform and Inverse – – Transform: X[k] = ∑ n=0,N-1 x[n] e -i2∏kn/N Inverse: x[k] = (1/N)∑ n=0,N-1 X[k] e i2∏kn/N Note: X[k] is a complex number representing magnitude/phase Conclusion: We can go between time and frequency domains
Signal Plot • • •
The phases are shown in the spectrum plot in the complex plane.
The phase affects how the time domain signal looks.
The amplitude of the spectrum plot remain constant regardless of phase.
Fourier Transform of Square Wave
-1/2 1/2 • • •
Fourier Transforms exhibit the property of duality
Square wave in frequency = to window sync function in time and visa versa Convolution in time = multiplication in frequency and visa versa Proof with calculus ∫ -∞,∞ x(t)e -jtkw = (1/jw)(e -jw½ 0 dt = ∫ -1/2,1/2 x(t)e -jtkw 0 dt = ∫ -1/2,1/2 e -jtkw 0 dt =(1/jw)e -jwt | -1/2,1/2 –e -jw(-½) )=(1/jw)(e jw/2 –e -jw/2 ) = sin(jw/2)/(jw/2)
Complex DFT by Correlation
double[] DFT( double[] time, int N) { double[] f[2*N], real, imag; double om, w = 2 * Math.PI / time.length; for (k=0; k Note: even indices = real part, odd indices = imaginary part } Complexity: O(N 2 ) because of the double loop of N each Example: For 512 samples, loops 262144 times Evaluation: Too slow, but FFT is O(N lg N) • The FFT algorithm is based on divide-and-conquer n O ( n ) O (log n ) n/4 n/2 n/4 n/4 n/2 n/4 O ( n ) O ( n ) The running time complexity is O(n log n) Correlation algorithm is O(N 2 ) Too slow to be practical even on today's processors Optimized FFT is O(N lgN) which is orders of magnitude faster Assume 512 elements in a window – O(N) = C * 512 – O(N2) = C * 512 * 512 = C * 262,144 – O(N lg N) = C * 512 * 9 = C * 4,608 Base Case: x[0] Recursive Relationship ∑ t=0 N-1 x[t] e -i2πkt/N = ∑ t=0 N/2-1 x[2t] e -i2πk(2t)/N + ∑ t=0 N/2-1 x[2t+1] e -i2πk(2t+1)/N = ∑ t=0 N/2-1 x[2t] e -i2πkt/(N/2) + ∑ t=0 N/2-1 x[2t+1] e -i2πk(2t+1)/N = ∑ t=0 N/2-1 x[2t] e i2πkt/(N/2) = F k even + e -i2πk/N ∑ t=0 N/2-1 x[2t+1]e -i2πkt/(N/2) + e -i2πk/N * F k odd Note: work at each step is O(N); there are lg(N) levels Complex[] fft(Complex[] x) { int N = x.length; Complex[] y = new Complex[N]; if (x.length==1) {y[0] = x[0]; return y; } Complex[] even = new Complex[N/2]; Complex[] odd = new Complex[N/2]; for (int m=0; m Note: e -2kπ/N = -e -2kπ/N+N/2 The computations still are an order of magnitude slower than needed The Complex class causes many jumps and puts pressure on the hardware cache Declaring and copying arrays at every step slows things down at least by half Repetitive calculations of sines and cosines are extremely slow N<<1 is ten times faster than N/2 Overhead associated with activation record creation due to the recursion calls is very slow Butterfly algorithm 000 001 010 011 100 101 110 111 000 000 010 100 100 010 110 110 001 001 011 101 101 011 The numbers in the rectangles are the array indices You see the original indices as we pass through each level of recursion Can you see a pattern ? 111 111 Flip bits from left to right int j = N>>1, k; for (int i=1;i Compute the values ahead of time and save repetitive calculations e i2πk/N = cos(2πk/N) + i sin(2πk/N) We can store in an array (sinX[]) sin(2π0/N), sin(2π1/N), sin(2π2/N) sin(2π3/N), … sin(2π(N-1)/N) cos(2πk/N) = sinX[(k+(N>>2))%N] } // Perform the fft calculations. for (int stage=1; stage<=M; stage++) // M = lg N { // Remember that complex numbers require pairs of doubles fftSubGroupGap = 2< // Outer loop: each sub-fft group; inner loop: combine group elements for (int even=0; even for (int element=even; element<(even+gap); element+=2) { // ***** See Next Slide ***** k += kInc; // position for next look up. } } kInc >>= 1; // Look up e^2PIki/N avoiding trig calculations here. realW = sines[(k+(N>>2))%N]; // cos(2PIk/N); imagW = -sines[k%N]; // -sin(2PIk/N); // Complex multiplication of the odd entry of the subgroup // with (e^2PIi/N)^k = (cos(2PI/N) - i * sin(2*PI/N)^k j = (element + gap); tempReal = realW * complex[j] - imagW * complex[j+1]; tempImag = realW * complex[j+1] + imagW * complex[j]; // Adjust the odd entry (subtract: the fft is periodic). complex[j] = complex[element] - tempReal; complex[j+1] = complex[element+1] -tempImag; //Adjust the even entry. complex[element] += tempReal; complex[element+1] += tempImag; • Standard Fast Fourier Transform – – requires N to be a power of 2 for recursion to work Can pad the array with zeroes to extend frequency domain • Can it work if N is not a power of 2? – Yes, but special slower processing is needed • How do we know if it works? – Point N/2-1 = Point N/2+1, Point N/2-2 = Point N/2+2, Point N/2-k = Point N/2 + k, etc. – – Note: Points 0 and N/2 don't match, so don’t check these The FFT Inverse should restore the time domain signal – Compare to the slower correlation DFT calculation – Try some simple impulses and check the resultsThe FFT Algorithm
Why do we need FFT?
•
•
•
•
Theory for Optimization
Simple Recursive FFT Solution
Inefficiencies
• • • • •
Eliminating the Recursion
• • •
Butterfly Code
Sin and Cosine Table Look Up
• • •
Optimized FFT – after butterfly code
Multiplication Portion
Final Notes