Transcript Slide 1

Speech

Speech Recognition Front End

This week’s focus is on the spectral analysis Pre-emphasis windowing Spectral Analysis Enhance Features Temporal Features Frequency Features Consolidate Features Feature Vectors

Spectral Analysis

This week’s emphasis will be on Fourier Analysis • • • Goal: Find useful frequency related features Approaches –

Without Fourier Analysis:

• Apply a recursive band pass bank of filters • Use linear predictive coding (LPC) –

With Fourier Analysis:

• Calculate a Fourier transform • warp results based on the MEL scale Applications: Auditory models mimicking human hearing – Eliminate noise by removing non-voice frequencies – Detect formants present in signal – Perform Cepstral analysis to detect pitch and recognize speech – Auditory nerves stop responding to extended occurrences of the same frequency • Idea: Deemphasize frequencies present for extended periods.

Results: Effective for speech recognition in noisy environments

The Fourier Transform Family

• • • • •

Fourier Series

A decomposed weighted sum of sinusoidal functions that models an arbitrary infinitely periodic continuous function

Fourier Transform

A linear operation that maps an arbitrary function with infinite range into a spectrum of its frequency components

Discrete Fourier Transform (DFT)

A Fourier Transform applied to a discrete infinitely repeating periodic series of complex numbers.

Discrete Time Fourier Transform (DTFT)

A Fourier Transform applied to a a-periodic discrete series of complex numbers that extend from ± ∞.

Fast Fourier Transform: Fast way to calculate DFT

The number e

• • • •

e = lim

n->∞

{(1 + 1/n)

n

}

When n = 1 e ≈ 2 When n = 2 e ≈ (1 + ½) 2 = 9/4 = 2.25

When n = 3 e ≈ (1 + 1/3) 3 = 64/27 = 2.37037

When n is extremely large, it approaches the value: e = 2.718281828 … What does this have to do with sound?

Answer: The future slides will tell.

Quick Calculus Review

• • • • The derivative of a function at a point is the slope of the function at that point (change in y over change in x).

The derivative of x 2 = 2x (Notation: f’(x 2 ) = 2x) lim ∆x->0 ( (x+∆x) = lim ∆x->0 (x 2 2 – x 2 )/ ∆x + 2x∆x + ∆x 2 – x 2 )/∆x = lim ∆x->0 (2x + ∆x) = 2x Tables of derivatives proved by mathematicians exist We will need these: – – f’(x n ) = nx n-1 f’(Sin x) = Cos x, f’(Cos x) = -Sin x – f’(e x ) = e x , f’(e ax ) = a e ax

Complex Numbers • •

Extends the number line to a plane – – Horizontal axis: Real Numbers Vertical axis: Complex Numbers – Rectangular Notation: a + bi • a along the real axis • b along the imaginary axis Operations – Addition: (a+bi) + (c+di) = (a+b) + (b+d)I – – Multiplication: (a+bi) * (c+di) = (ac – bd) + (ad + bc)I Division: (a+bi)/(c+di) solved by multiplying numerator and denominator by the conjugate of c+di, which equals c-di

Polar Notation

Distance and angle from the origin

• • • Rectangular Form 4+i3 Convert to Polar Form (5,36.87) – – M = sqrt(4 2 +3 2 ) = 5 Ө = arctan(3/4) Convert to Rectangular – A+ib = M(cos Ө + i * sinӨ)

Note:

At 90 and 270 degrees we have a divide by zero

McLauren Series for e, sin, cos • • •

• McLauren Series to estimate any well-behaved function in terms of polynomials f(x) = f(0)x 0 /0! + f’(0)x 1 /1! + … + f n (0)x n /n! + … Try it out say for the third derivative at x = 0 f 3 (0) = 0 + 0 + 0 + 3*2*1 f 3 (0)/(3*2*1) + 0 + 0 + … All the derivatives match at x = 0.

Series that we will need • e x = 1 + x + x 2 /2! + x 3 /3! + x 4 /4! + … • Sin x = x – x 3 /3! + x 5 /5! – x 7 /7! + … • Cos x = 1 – x 2 /2! + x 4 /4! – x 6 /6! + …

Note: 0! = 1

Another way to calculate e: e = 1 + 1 + 1/2! + 1/3! + …

Sine, Cosine and e

From Previous Slide

e x = 1 + x + x 2 /2! + x 3 /3! + x 4 /4! + … Sin x = x – x 3 /3! + x 5 /5! – x 7 /7! + … Cos x = 1 – x 2 /2! + x 4 /4! – x 6 /6! + … e i Ө = 1 + i Ө + (iӨ) 2 /2! + (i Ө) 3 /3! + (i Ө) 4 /4! + (i Ө) 5 /5! + (i Ө) 6 /6! + (i Ө) 7 /7! + ··· (Multiply terms to eliminate higher powers of i) = 1 + i Ө - Ө 2 /2! - i Ө 3 /3! + Ө 4 /4! + i Ө 5 /5! (Gather real and complex terms together) Ө 6 /6! - i Ө 7 /7! + ··· = (1 ···) Ө2/2! + Ө 4 /4! Ө 6 /6! + ···) + i (Ө - Ө 3 /3! + i Ө 5 /5! - i Ө 7 /7! + (Substitute Cos and Sin terms for the series) e i Ө = cos( Ө) + i sin(Ө) (This is called Euler’s formula )

Key Formulae and Identities

Euler's Formula:

e

ix

= cos(x) + i * sin(x)

Trigonometric Identities:

cos(x)=cos(-x) and sin(x)=-sin(-x) cos(x) = (e ix + e -ix )/2 and sin(x) = (e ix – e -ix )/2i sin 2 (x)+ cos 2 (x) = 1 sin(x+y) = sin(x)cos(y) + cos(x)sin(y) cos(x+y) = cos(x)cos(y) - sin(x)sin(y)

Quick Linear Algebra Review

• • • • • • Linear algebra extends Euclidian space beyond three dimensions.

<3,4,5> represents a vector going from points (0,0,0) to (3,4,5).

Two vectors are orthogonal (perpendicular roughly speaking) if their inner ( dot product ) equals 0.

– – Example: <1,0,0> • <0,1,0> = 1*0 + 0*1 + 0*0 = 0 Example: <3,1>•<-1,3> = 3*-1 + 1*3 = 0 Two functions are orthogonal between a and b if ∫ a,b f(x)g(x)dx = 0 A set of functions are mutually orthogonal if ∫ a,b f i (x)f j (x)dx = 0 if i≠j and c>0 if i=j.

Why do we need this?

Orthogonal function sets can be used to decompose or construct signals .

Inner Product:

sum the products of correspondent coordinates

Basis to span a space

• • • •

Consider the orthogonal basis <1,0,0>, <0,1,0>, <0,0,1>

– These form a basis a three dimension space.

Why?

Any 3-dimension vector is a linear combination of these – Example: <4,3,2> = 4 * <1,0,0> + 3 * <0,1,0> + 2 * <0,0,1>

Consider the orthogonal basis vectors: <1,2>, <-2,1>

– They are orthogonal because: <1,2> • <-2,1> = 0

Consider the basis vectors: <1/5 ½ ,2/5 ½ >, <-2/5 ½ ,1/5 ½ >

– Also orthogonal because the inner (dot) product is 0) – <1/5 ½ ,2/5 ½ >has a length of unity ((1/5 ½ ) 2 + (2/5 ½ ) 2 ) ½ = 1 – <-2/5 ½ ,1/5 ½ > also has a length of unity (same distance calculation)

Orthonormal

basis vectors: orthogonal and have unity length

Orthogonal and Orthonormal

• • • • Experiment (intuitive example, not mathematically precise)

Goal: construct <4,7> from basis vectors

– –

Orthogonal Basis

: <1,2> and <-2,1> <1,2> • <4,7> = 18 and <-2,1> • <4,7> = -1 – 18 <1,2> + (-1)<-2,1> = <20, 35> which is five times <4,7>

Another experiment

– – –

Orthonormal basis

: <1/5

½ ,2/5 ½ >, <-2/5 ½ ,1/5 ½ >

<1/5 ½ ,2/5 ½ > • <4,7> = 18/5 ½ and <-2/5 ½ ,1/5 ½ >• <4,7> = -1/5 ½ (18/5 ½ )<1/5 ½ ,2/5 ½ > + (-1/5 ½ )<-2/5 ½ ,1/5 ½ > = <20/5, 35/5> = <4,7> Conclusion: Orthonormal basis vectors correlated with another vector gets the multiple of that basis vector.

Fourier Series

• A Fourier series is an sum (possibly by not necessarily infinite) of Sine and Cosines to model a continuous signal. • Fourier modeling allows us to decompose a signal, perform processing, and recombine the results to solve an original problem

Fourier Decomposition

The top signal decomposes into nine cosine and sine waves

Fourier Square Wave Synthesis

Fourier Cosine Series

Integral: Cos ( πx/3) * cos (2 πx/3) cos ( πx/3) and cos (2 πx/3)

• The set of functions: {cos(k2πF 0 } where k is an integer >0 – Mutually orthogonal from –T to T for 0 ≤ t < ∞; T>0 – ∫ -L,L cos(k 1 2 πx/P) cos(k 2 2 πx/P)dx = 0 if k 1 ≠ k 2 ; ≠ 0 if k 1 – Proof requires some Calculus: Namely integration = k 2 • • x(t) = a 0 cos(0*2 πF 0 t) +a 1 cos(1*2 πF 0 t) +a 1 cos(2*2 πF 0 t) … x(t) = a0 + ∑ k=1,∞ a k cos(k2 πFt ) where F = 2π/T

Comment:

The series doesn’t include phases, if we add phases we have twice as many unknowns to compute

• •

A General Orthogonal Function Set

Euler Equation: e iφ – = cos(φ) + i sin(φ) Radius = magnitude (always unity); φ = phase.

Consider the function set: {e iω k } – Angular frequency: ω k = 2πkF 0 = 2πk/T 0 – F 0 ,T 0 Fundamental frequency & period.

– k = speed which e iω k traverses the circle –

Orthogonal because

∫ -∞,∞ e jω n e jω m =0 whenever n ≠-m 1.

Notes

The book uses j instead of I 2.

3.

4.

5.

Electrical engineers prefer j Mathematicians prefer I Get used to both!

In the diagram, φ = 2πF 0

• •

Orthogonality Example

Left: Correlate top with middle resulting bottom having area ≠0 Right: Correlate top with middle resulting bottom having area = 0

• • • •

Putting it all together

{e iω k } is an Orthogonal basis for signals – Each function: e iω k is a basis function – We can use to basis functions to synthesize signals Synthesize (Fourier series) – Source: frequency magnitudes, Sink: time signal – – x(k) = (1/T)∑ k=0,T a k e ikω 0 where x(k) = signal at time t T = # of basis functions (possibly infinite); a k = magnitude of w k For computer processing, we need a discrete counterpart – Why? We don’t to deal with infinite points or basis functions – – x[k] = (1/N)∑ k=0,N X[k] e i2∏kn/N k determines how fast the sum traverses the circle (higher k faster) – N basis functions and N frequencies Note: For periodic functions, we can use [0,T] instead of [- ∞,∞]

• • • • • • • •

Fourier Analysis

Goal: Compute coefficients given the signal.

Synthesis equation: x(t) = ∑ k= -∞,∞ Multiply both sides by e -itkw 0 x(t)e -itkw 0 = (∑ k= -∞, ∞ a k e imtω 0 )e -itkw 0 a k e itmω 0 Integrate over the period: 0, T0 ∫ 0,T0 x(t)e -itkw 0 dt = ∫ 0,T0 (∑ k= -∞, ∞ a k e imω 0 t ) e -itkω 0 dt ∫ ∫ The sum will be zero except when k = m 0,T0 0,T0 x(t)e -itkw 0 dt = (∑ k= -∞, ∞ x(t)e -itkw 0 dt = (∑ k= -∞, ∞ a a k k ) ∫ ) ∫ 0,T0 0,T0 (e (e timω 0 ) e it(m-k)ω 0 -itkω )dt 0 dt The only time this is non-zero is if k=m ∫ 0,T0 x(t)e -itkw 0 dt = a k ∫ 0,T0 dt = a k t | 0,t0 = a k T0 The answer (value of coefficient m): a k = (1/T0)∫ 0,T0 x(t)e -itkw 0 dt Note: 1/T0 is simply a constant the scales the result

Discrete Version

• • • • • Definition: Continuous Fourier Transform and Inverse – – Transform: X(w) = ∫ -∞, ∞ x(t)e -itwt dt Inverse: x(t) = (1/2π)∫ -∞, ∞ X(w)e iwt dw Convert from continuous version: – – Evaluate at N equally spaced points (period now is N) Use sums to approximate the integral – Note: x(t) = value at time t, x[n] is x(t) evaluated at time 2 ∏ kn/N Discrete Fourier Transform and Inverse – – Transform: X[k] = ∑ n=0,N-1 x[n] e -i2∏kn/N Inverse: x[k] = (1/N)∑ n=0,N-1 X[k] e i2∏kn/N Note: X[k] is a complex number representing magnitude/phase Conclusion: We can go between time and frequency domains

Signal Plot • • •

The phases are shown in the spectrum plot in the complex plane.

The phase affects how the time domain signal looks.

The amplitude of the spectrum plot remain constant regardless of phase.

Fourier Transform of Square Wave

-1/2 1/2 • • •

Fourier Transforms exhibit the property of duality

Square wave in frequency = to window sync function in time and visa versa Convolution in time = multiplication in frequency and visa versa Proof with calculus ∫ -∞,∞ x(t)e -jtkw = (1/jw)(e -jw½ 0 dt = ∫ -1/2,1/2 x(t)e -jtkw 0 dt = ∫ -1/2,1/2 e -jtkw 0 dt =(1/jw)e -jwt | -1/2,1/2 –e -jw(-½) )=(1/jw)(e jw/2 –e -jw/2 ) = sin(jw/2)/(jw/2)

Complex DFT by Correlation

double[] DFT( double[] time, int N) { double[] f[2*N], real, imag; double om, w = 2 * Math.PI / time.length; for (k=0; k

Note:

even indices = real part, odd indices = imaginary part }

Complexity:

O(N 2 ) because of the double loop of N each

Example:

For 512 samples, loops 262144 times

Evaluation:

Too slow, but FFT is O(N lg N)

The FFT Algorithm

• The FFT algorithm is based on divide-and-conquer n

O

(

n

)

O

(log

n

) n/4 n/2 n/4 n/4 n/2 n/4

O

(

n

)

O

(

n

) The running time complexity is O(n log n)

Why do we need FFT?

Correlation algorithm is O(N 2 )

Too slow to be practical even on today's processors

Optimized FFT is O(N lgN) which is orders of magnitude faster

Assume 512 elements in a window – O(N) = C * 512 – O(N2) = C * 512 * 512 = C * 262,144 – O(N lg N) = C * 512 * 9 = C * 4,608

Theory for Optimization

Base Case:

x[0]

Recursive Relationship

∑ t=0  N-1 x[t] e -i2πkt/N = ∑ t=0  N/2-1 x[2t] e -i2πk(2t)/N + ∑ t=0  N/2-1 x[2t+1] e -i2πk(2t+1)/N = ∑ t=0  N/2-1 x[2t] e -i2πkt/(N/2) + ∑ t=0  N/2-1 x[2t+1] e -i2πk(2t+1)/N = ∑ t=0  N/2-1 x[2t] e i2πkt/(N/2) = F k even + e -i2πk/N ∑ t=0  N/2-1 x[2t+1]e -i2πkt/(N/2) + e -i2πk/N * F k odd

Note:

work at each step is O(N); there are lg(N) levels

Simple Recursive FFT Solution

Complex[] fft(Complex[] x) { int N = x.length; Complex[] y = new Complex[N]; if (x.length==1) {y[0] = x[0]; return y; } Complex[] even = new Complex[N/2]; Complex[] odd = new Complex[N/2]; for (int m=0; m

Note:

e -2kπ/N = -e -2kπ/N+N/2

Inefficiencies

The computations still are an order of magnitude slower than needed

• • • • •

The Complex class causes many jumps and puts pressure on the hardware cache Declaring and copying arrays at every step slows things down at least by half Repetitive calculations of sines and cosines are extremely slow N<<1 is ten times faster than N/2 Overhead associated with activation record creation due to the recursion calls is very slow

Eliminating the Recursion

Butterfly algorithm 000 001 010 011 100 101 110 111 000 000 010 100 100 010 110 110 001 001 011 101 101 011

• • •

The numbers in the rectangles are the array indices You see the original indices as we pass through each level of recursion Can you see a pattern ?

111 111

Butterfly Code

Flip bits from left to right int j = N>>1, k; for (int i=1;i>1; while (k>=2 & j>=k) { j -= k; k >>= 1; } j += k; } • Most Significant Bit SwapBit ( x, x + lgN) • Second most significant bit SwapBit(x, x + lg(N/2) • Third most significant bit SwapBit(x, x + lg(N/4) • kth most significant bit SwapBit(x, x + lg(N/2 k ))

Sin and Cosine Table Look Up

Compute the values ahead of time and save repetitive calculations

• • •

e i2πk/N = cos(2πk/N) + i sin(2πk/N) We can store in an array (sinX[]) sin(2π0/N), sin(2π1/N), sin(2π2/N) sin(2π3/N), … sin(2π(N-1)/N) cos(2πk/N) = sinX[(k+(N>>2))%N]

Optimized FFT – after butterfly code

} // Perform the fft calculations.

for (int stage=1; stage<=M; stage++) // M = lg N { // Remember that complex numbers require pairs of doubles fftSubGroupGap = 2<>1; // 2, 4, 8, ... – odd/even distance kInc = N>>1; // Number of 2PIki/N steps for odd/even entries.

// Outer loop: each sub-fft group; inner loop: combine group elements for (int even=0; even

for (int element=even; element<(even+gap); element+=2) {

// ***** See Next Slide *****

k += kInc; // position for next look up.

} } kInc >>= 1;

Multiplication Portion

// Look up e^2PIki/N avoiding trig calculations here.

realW = sines[(k+(N>>2))%N]; // cos(2PIk/N); imagW = -sines[k%N]; // -sin(2PIk/N); // Complex multiplication of the odd entry of the subgroup // with (e^2PIi/N)^k = (cos(2PI/N) - i * sin(2*PI/N)^k j = (element + gap); tempReal = realW * complex[j] - imagW * complex[j+1]; tempImag = realW * complex[j+1] + imagW * complex[j]; // Adjust the odd entry (subtract: the fft is periodic).

complex[j] = complex[element] - tempReal; complex[j+1] = complex[element+1] -tempImag; //Adjust the even entry.

complex[element] += tempReal; complex[element+1] += tempImag;

Final Notes

Standard Fast Fourier Transform

– – requires N to be a power of 2 for recursion to work Can pad the array with zeroes to extend frequency domain •

Can it work if N is not a power of 2?

– Yes, but special slower processing is needed •

How do we know if it works?

– Point N/2-1 = Point N/2+1, Point N/2-2 = Point N/2+2, Point N/2-k = Point N/2 + k, etc.

– – Note: Points 0 and N/2 don't match, so don’t check these The FFT Inverse should restore the time domain signal – Compare to the slower correlation DFT calculation – Try some simple impulses and check the results