New Algorithms in Acoustical Test Technique

Download Report

Transcript New Algorithms in Acoustical Test Technique

Alternative Algorithmic
Methods in the Acoustical
Noise and Function Testing
(alternatives to the classical Fourier Transformation)
Christoph Lauer – www.christoph-lauer.de
Alternative Signalanalyse
Algorithms and Methods
Major:
Minor:
•“Linear Predictive Coding“ (LPC)
and allied Techniques
•Algorithms and statistical Methods
from the „Natural Language
Processing“ (NLP)
•Wavelet Transformation.
•Jitter Analyse
•Autocorrelation Signal „InformationContent“ extraction.
•Magnitude Spectrum
•BoxCarSmoothng, adjustable
Logarithmization, Various
Normalizations, ZeroCrossingRate,
PolygonChain...
Linear Predictive Coding
•
•
•
•
•
What is LPC
Ho to extract the LPC Parameter
The LPC Error and the application in the acoustical function test
technique
Alternative methods compared with the LPC-Error method
Spectral Analysis via LPC
What is LPC
•An „autoregressive gaussian process“ can be
described with:
•Future Samples are predicted with the LPC
coefficients:
•Generative LP parameters:
•LPC get his name from the fact that it predicts the
current sample as a linear combination of its past
p samples.
How to estimate the predictor
coefficients and predict the future
•
•
Coefficient Estimation:
Beginning from a given set of input samples, we extract the
coefficients which minimize the sum of the squared error.
Complex method. Code lean from the “Numerical Recipies”.
A standard Levinson-Durbin matrix inversion can be used to
solve the LP Coefficients (with the Yule-Walker Equation).
Future Prediction:
Given the the LP Coefficients, a IIR (Pole-)Filter predicts the
future samples.
The LPC Error
•
•
The forward prediction Error for the pth order
prediction can be written as:
The prediction Error is the difference from the
predicted future to real samples.
Application of the LPC Error in the
acoustical noise and function testing
•
•
Based on the LPC error Equation we implement a
Win-Shifted version.
Base parameters of the implementation:
previous prediction points:
Samples used to extract the Coeff’s
future prediction points:Samples to predict the future
number of coefficients:
The number of Coeff’s to compute
window shift:
The step size
Results with the LPC-Error
Method (defect Gear Motor)
previous prediction points:
future prediction points:
number of coefficients:
window shift:
50 ms -> 1250 samples
10 ms -> 250 samples
1249 LP-coefficients
1 ms -> 250 samples
Advantages Disadvantages
form the LPC-Error
•
•
Advatages: Seems to be a very Robust Method
Disadvantages: Silent parts runs the algorithm into artiffical
wrong predictions, so a precutted signal is necessary. Slow
with high precicion !
•
Notes: The numer of coefficients can not be greather than the
number of previous samples from where the coefficients are
extracted. Best precicion can be estimated if the number of
coefficients is with the maximum possible coefficients. Because
we coded the prediction by hand the speed performance from
5..6 with the compiler optimizations.
LPC-Error compared with
alternative methods
•
Came from the so far not solved Problems in the
acoustical noise an functiontesting we developed
two further methods.
1. Autocorrelation based Information Content
extraction.
2. Micro Changes in the Frequency Domain are
called Jitter Analyse.
Autocorrelation based
“Information-Content” Extraction
•
•
•
We implement our own standard Autocorrelation with variable
inner summarization loop length M:
We generate then the Autocorrelation result over the known
time:
The autocorrelation corresponds to the information-content
(Informationsgehalt in German) over the time. Places where
happens a lot the changes are high.
Results from the
Autocorrelation
Inner Summarization Loop length = 128 samples
Advantages Disadvantages of
the Autocorrelation Method
Disadvanteges: We see empirically that the
AC does not fit the best practice.
Advantages: This algorithm is very very fast.
The LPC-Spectrum
Envelope
•
•
The predicted future samples can be
transformed into the frequency domain.
The LPC-Spectrum interpolates an Envelope of
the Powerspectrum.
LPC-Spectrum Example with
10 and 54 Filter Coefficients
Jitter-Analysis based
on the LPC-Spectrum
•
•
Jitter is defined as variantions of the whole
signal in the frequency domain.
In the Signal We track the peak in the
frequency domain!
How to compute the Jitter
•
There is no real specification or calcualtion rule, our
method:
(1) Bandpass
(2) track the peak frequency
(3) compute the derivation of the peak function
Results from the Jitter Analyze
previous prediction points:
future prediction points:
number of coefficients:
window shift:
lower band border:
upper band border:
8192 samples
1000 samples
900 LP-coefficients
100 samples
3000 Hz
3500 Hz
Advantages and Disadvantages
of the Jitter-Analyze
•
•
•
Advantages: Very precise for signals with
constrained small-band character.
Disadvantages: The Signal must be pre-filtered to
prevent disruptions. The Signal must lay in the
selected band. Not so fast.
Notes: Successful method for the detection defect motor
transmissions.
Three Techniques
LPC Error
Autocorrelation
Jitter Analyze
Example: a defect Gear-Motor
Wavlet-Transformation
•
•
The classical FT differ from the WT in the time localization capability which arrises
from the kernel-function. Compared with STFT, the wavelet transformation scales
octavewise in frequency domain and doubles the time resolution for each octave.
We have currently a multiresolution version with 4 different base Kernels
(Daubechies, Coiflet, Beylkin, Vaidyanathan) running. A problem could be finding a
propper time start point for a classifyer.
Wavelet Package
Decomposition
•
•
Unlike the classical Multiresoltion Wavelet Pyramid
Algorithm, the Lowpass results can be used for the
further Analyze.
We avoid the acoustical uncertainty relation !
Magnitude Spectrum
•
•
•
•
•
•
•
DC-Substraction
Peak-Spectrum extraction
A real resampler with LP Filtering for the
Demodulation routines
Power Spectrum
Logarithmized Level Spectrum
Own Independent implementation of the Core FFT
algorithmus
Reusable/Structured/Documented source code
Three Resampling
Algorithms
• Zero-Order-Hold Converter
• Linear Interpolation Converter
• Sinc Bandlimited Interpolator
Zero-Order-Hold Converter
•
•
Interpolated value is equal to the last value while
upsampling, downsampling is camb filtering .
Poor quality, speed is blindlingly fast.
Linear Converter
•
•
•
Included/excluded samples will be linear
interpolated.
No antialiasing post filtering !
Conversion speed is blindlingly fast.
Sinc Bandlimited Interpolation in Theory
•
Perfect reconstruction corresponds to applying a
perfect Low Pass Filter with cutoff
i.e. it
corresponds to convolving with a sinc function
•
The sinc function has a response that goes from
, so it cannot be used in practice, except
for periodic signals.
•
Multiplication of an Low Pass Filter in the
frequency domain corresponds an convolution in
the time doimain with an sinc function.
Sinc Bandlimited Interpolation
in the Practice
Sinc Bandlimited Interpolation
in the Practice
•
•
The precision of the convolution rise and falls with
the number of convolution coefficients.
We have currently three filter banks generated
with lengths from 2464, 22438 and 340239
coefficients, for Low, Medium and High quailty.
Antialiasing Filter Results of
the SINC Interpolator‘s
Low Quality with 2464 supporting points
Low Quality with 22438 supporting points
Low Quality with 340239 supporting points
Speed Comparization
SRC TYPE
48000 -> 64000
48000 -> 8000
Factor
ZOH
0.0066
0.0061
163
LINEAR
0.0064
0.0064
156
LOW_SINC
0.12
0.12
8.33
MEDIUM_SINC
0.29
0.29
3.45
HIGH_SINC
1.0
1.0
1.0
Additional Algorithms in the
algorithm Collection
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Own FFT
Own Autocorrelation
Linear Predictive Coding (LPC-Spectrum, Prediction and Prediction Error)
Jitter Analyze
Sample Rate Converter
Lin/Log function with scalabale reference point and log base (inplace and
outplace)
WaveFile writer/reader
Zero Crossing Rate
Polygonal Chain
Wavelet Multiresoltion and Package (in preparation) Transformation
Smoothing (Boxcar Algorithm)
Nomalization (to AVG, RMS and Intervall)
Auto Zeropadding to a length from a power of two
Asynchron Exponention Window Function
The Alogrithm Collection –
Programming Notes
•
•
•
•
All Algortihms are Platform-Independet
implemented in C++ and GCC/Make.
Generic Typing is used if possible,
Tempaltes.
Clear Structured, well Documented and
Rereusable code peaces.
Namespace clauer::math:: and
clauer::io::
Algorithms lean from the Natural
Language Processing for the Acoustical
Noise and Function Test Technique
Roadmap:
Introduction to the Speech-Recognition
techniques.
Speech recognition algorithms applicated to the
noise and function testing.
Problems with the resonance analysis.
Alternatives for the Feature Extraction.
Other tools developed.
•
•
•
•
•
HMM/GMM Speech Recognition Overview
Feature Extraction and MFCC‘s
•After the power spectrum estimation of the windowed signal,
the logarithmic Mel-Filter bank matches best the distribution of
the cilia in the ear snake.
•The last DCT is done to decorrelate the speech spectrum to
achieve easyer post processing with gausian mixtures  Later
more here.
•[100,200,300,400,500,600,700,800,900,1200,1500,1800,2400,2900,2600]
HZ
Speech Features
•
•
•
•
We have 12 Band energy representaions
corresponding the Mel Spectrum.
We take the summ ernergy of the Band C0 - 13
features.
Δ and the Δ Δ‘s of this represeantion.
39 Dimensional Feature Vector
(1) Gaussian Mixture Models (GMM‘s)
•The normal-distribution.
•Multivariate normal-distribution.
(2) Gaussian Mixture Models (GMM‘s)
•
•
While training we have a group of well known time
windows and for each a 39 Dimensional Vector.
For each group we cluster a Gaussian-ProbabilityDistribution, called Gaussian-Mixture-Model which
consist of a mean vector and a covariance matrix.
(3) Gaussian Mixture Models (GMM‘s)
•
The need for the last decorrelatig DCT in the
Feature extraction can be seen in the covariance
matrix.
Hidden Markov Models (HMM‘s)
•
Progress of the GMM‘s over the time.
Training
•
•
•
•
For the extraction of the transition probabiltys of
the HMM the Viterbi-Algorithm/Baum-Welch
is used.
The extraction of the GMM parameters can be
done with the Forward-Backward Algorithm.
This is the complexest Part of the implementation.
Over 20 Jears of development into this algorithms.
The result is is the so called acoustic Model.
The Acoustic Model
Decoding/Klassification
Language Model
•
•
•
The Language Model is usally based on so called
statistical N-Grams.
Bi and Trigram statistical Models.
In case of our simplified Model, we have only a
dictionary of 2 Word‘s (NOK and OK)  We do not
need a Language Model or Linguist because we do not
want to detetct any concatenation of words in
sentences.
Implementation
•
•
•
•
Two fully implemented Recognizers are available:
HTK(Cambridge University , lincense forbids
redistribution) and Sphinx (Carnedgie Mellon
Univerity, redistribution for commercial purposes
allowed)
HTK has a more cleaner strukture and is easyer to
modify for my purposes, but the license doesnt match
our needs.
Sphinx was orininally developed by the ARPA and is
available in 4 Versions, Sphinx1/2/3/4.
We use the newer Versions 3 and 4.
Sphinx III
•
•
•
•
•
•
Programmed in C++
5-10 real time processing.
3 or 5-state continious HMM topologies.
Live and Batch/File Operation.
Statistical and Binary Models.
FST decoding, re-factoring, re-architect
Sphinx IV
•
•
•
•
•
Programmed in Java.
Faster than Sphinx III.
Continious and semicontinious density
acoustic models with arbitrarily number of
states.
Bigram, Trigram or finite State grammar
language model.
Fasts Viterby search.
Sphinx Train
•
•
•
•
Sphinx 3/4 Decoders have no Trainer included.
For Sphinx 3 the SphinxTrain Package is available,
this allows the training of acoustic models with the
Baum-Welch algorithm.
Sphinx 3 uses the same Feature extraction, as the
Sphixn Train Package.
The Sphinx 4 Trainer is not finished so far, but there
is a wrapper for the Sphinx-3-Models trained with the
Sphinx Train Package.
Research-Recognizer
•
•
•
•
•
•
Our-Research-Prototype is implemented in MS-Windows
with CYGWIN, it should run on other systems too.
Programming Languages: Python, Perl, C++, Bash,
Ruby…
The whole recognizer is a script controlled collection of
over 200 small command-line Programms.
The Tainer and the Decoder are separate Programms.
The whole Recognizer Envionemant has a size of 700MB.
We use Phoneme-Models instead of Word-Models !!!
Research-Recognizer
•
•
•
Phoneme Models instead of Word Models
Acoustic-Model training with 3-State HMM‘s
and the standart MFCC‘s with 39 Features
((12+1) * 3)
Use the Sphinx4 (Java) and Sphinx3
(C++)Decoders
Life Demo and Paper
IO
NIO
Results
•The spectrum of the asparagus matches very well the mel scale.
•The length of the input samples (50ms) matches a the 3-State
HMM phoneme.
Applicated to the Acoustical
Noise Testing
Our research-recognition-system runs very well for the problems
in the acoustical noise testing where the spectral distribution of
the impulse responses matches the MFCC spectrum distribution.
For example the impulse-response of asparagus.
Problems
•
•
•
In the acoustical resonance analysis we have to concentrate only to small region of interests.
We run into a problem with the Frequency-Time resolution because for the feature extraction
the time window is only 10ms long. With the classical Fourier-Transformation we run into
problems with the Uncertianty Principle.
We have to modyfy the Featre extraction because in the acoustical resonance analysis we have
to exchange the MFCC‘s with our special case features  Build our own Feature-Extraction.
Acoustic Uncertainty Principle Example
1. ws = 10ms, sr = 8000, nf = 4000Hz 80
samples  40 spectral points  Δf = 4000Hz /
40 = 100Hz
2. ws = 10ms, sr = 50000, nf = 25000Hz 500
samples  250 spectral points  Δf = 25000Hz /
250 = 100Hz
Modification
Sphinx Train
•
•
In case of the resonace analysis we need to modify the feature extraction because
we have only a few samples from the impulse responses unlike human voice
recordings.
In case of the Sphinx 4 Decoder we have to modify the feature extraction at two
different places because the lack of the Sphinx 4 Trainer.
Avoid the Uncertainty Principle
•
•
We need an alternative to the Fourier
Transformation to avoid the Uncertainty Principle
because we have only a few samples.
Two possible methods to aviod the Uncertainly
Principle.
1.) The Wavelet-Packet-Decomposition
2.) The Wigner-Ville-Distibution
Wigner-Ville Distribution
•
•
Time-frequency-distribution of a signal with very high time and frequency resolution.
The Wigner-Ville Transformation came from physic in 1939 to add quantumcorrections to classical mechanic.

        j 
X WVD t ,     s t    s  t    e
d
2  2
- 
X WVD [n, m]  2
N / 21
 j 4mk / N
s
[
n

k
]

s
*
[
n

k
]

e

k  N / 2
Pseudo-Smoothed-WignerVille-Distribution
PSWVD Implementation
DISLIN Data-Plotter-Tool
•
•
•
Data Visualization Programming Library from the MPI
in Lindau for any Language (C/C++, Fortran, Java,
Perl, Phyton..)
Prints to the Screen, Printer and nay Image Format
(PNG, GIF, WMF, JPEG, BMP, XFIG
Commercial License 120 €
Data Plotter
Contour Plot
Waterfall Plot
Wavlet-Transformation
•
•
The differences between the Wavelet and the classical FT is the localization which
arrises from another kernel-function. Compared with STFT the wavelet
transformation scaled automatically the time resolution dependign on the
frequency.
We have currently a multiresolution version with 4 different base Kernels
(Daubechies, Coiflet, Beylkin, Vaidyanathan) and currently no application for the
Wavlet-Transformation because we need a time calibration/reference point to be
able to build a classifyer.
Wavelet Packet
Decomposition
•The DWT bases on orthogonal Filter Banks where
the high pass will be used to calcualte the next
Wavelet Octave.
•We have multiresolution pyramid Implementation.
Wavelet Packet
Decomposition
Conclusion
Development Roadmap
•
•
•
Nice to have: An fully automatic classiciation
system besed on Language Technology.
Need for new algorithms because the constrains of
the Fourier-Transformation.
An alternative to the mechanical impulsre response
extraction e.g. Asparagus project  Sine-SweeptTechnique
Development Roadmap
Development Roadmap
Basic Buildng Blocks
Damping Factor
• Calculation of the Envelope via the Hilbert
Transformation.
• Extract via the exponential Regression the
Damping constant.
Sine Sweep
•
•
The Impulse Response can also be extracted via
the recorded Noise response:
With a sine Sweep as actor signal and a special
deconvolution the nonlinearities can be
separated.
Thank You
Q&A