Facial Image Retrieval Through Compound Queries Using

Download Report

Transcript Facial Image Retrieval Through Compound Queries Using

Md. Zia Uddin
Bio-Imaging Lab, Department of Biomedical Engineering
Kyung Hee University
Previous Lecture
FIR & IIR
Fourier-Based Filter
Bipolar Filtering
Common Average Reference
Laplace Filtering
PCA
ICA
Spatial Filter: Common Spatial Pattern
First, the normalized covariance matrix of each single trial raw EEG X is
determined as
X is represented as an N x T matrix, with N the number of channels, T the number of
samples in the time interval of interest.
The average of covariance matrices from trials within class a and class b (Ra and Rb) are
summed to produce a composite covariance matrix Rc = Ra + Rb. The eigenvectors Bc
and eigenvalues λ of this covariance matrix yield a whitening transform
Now, when Ra and Rb are transformed by
then Sa and Sb share the same Eigenvectors, such that Sa = UDaU’ and Sb = UDbU’, where
U is the common orthonormal Eigenvectors of Sa and Sb.
Da and Db are the corresponding diagonal matrices of Eigenvalues which sum up to 1.
Spatial Filter: Common Spatial Pattern(2)
So, U’RaU=Da , U’RbU=Db , U’(Ra+Rb)U=I and Da +Db=I
Assume the Eigenvectors in U are sorted in descending order in respect of the
Eigenvalues in Da = (Da,1 , Da,2 ,..., Da,N ), Da,1 >= Da,2 ... >= Da,N (i.e. in ascending order
in respect of Db = (Db,1 , Db,2 ,..., Db,N ), Db,1 >= Db,2 ... >= Db,N .
Now, when class a and b are both projected onto the first Eigenvector U1, then class a
yields the maximal variance and class b the minimal variance.
Whereas when the classes are projected onto the last Eigenvector UN, then class a yields
the minimal variance and class b the maximal variance.
In general, the classes projected onto the Eigenvector Um and Un-m+1 yield the m largest
and m smallest variances in each class, respectively. So, the projection of data samples
onto these directions are optimal for classification.
Discriminability of features: Fisher Score
Given the data with labels. For example, there are two classes. Then
the fisher score becomes
S= |mean1-mean2|/(std1+std2)
Discriminant Analysis
The purpose of discriminant analysis is to assign objects to one of several (K) groups
based on a set of measurements X = (X1, X2, ..., Xp) which are obtained from each
object.
each object is assumed to be a member of one (and only one) group 1 <= k <= K
an error is incurred if the object is attached to the wrong group the measurements of all
objects of one class k are characterized by a probability density fk (X)
we would like to find a rule to decide for every object to which class it belongs to.
Example
A group of people consist of male and female persons  K = 2
From each person the data of their weight and height is collected  p = 2
The gender is unknown in the data set
We try to classify the gender for each person from the weight and height
 discriminant analysis
A classification rule is needed (discriminant function) to choose the group for each
person
Quadratic Discriminant Analysis
A set of vectors of observations x of an event, each of which has a known type y.
To determine for a new observation vector, the correct solution may be assumed to be
quadratic in the measurements, so y can be decided based on
In Quadratic discriminant analysis (QDA) it is assumed that there are only two classes of
points (so y€{0,1}), and the measurements are normally distributed.
Suppose the means of each class are known to be μy = 0,μy = 1 and the covariances Σy = 0,
Σy = 1. Then the likelihood ratio will be given by
Likelihood ratio =
for some threshold t.
After some rearrangement, it can be shown that the resulting separating surface between
the classes is a quadratic.
Linear Discriminant Analysis
LDA approaches the problem by assuming that the conditional probability density
functions and
are both normally distributed.
Under this assumption, the Bayes optimal solution is to predict points as being from the
second class if the likelihood ratio is below some threshold T, so that
Without any further assumptions, the resulting classifier is referred to as QDA (quadratic
discriminant analysis).
LDA also makes the simplifying homoscedastic assumption (i.e. that the class covariances
are identical, so Σy = 0 = Σy = 1 = Σ). In this case, several terms are canceled.
Regularized Linear Discriminant Analysis
The covariance matrices are controlled by two parameters λ and γ.
If λ =0, RDA
If λ =1, LDA
γ decreases the higher eigenvalues and increases the lower eigenvalues until γ=1
Fishers Linear Discriminant Analysis
Two scatter matrices are defined
Between scatter matrix
Within scatter matrix
The goal is to maximize the ratio between the SB and SW
Thus the main objective to maximize
It can be solved by generalized eigenvalue problem
where w and λ are the eigenvectors and eigenvalues of
respectively.
Nearest Neighbor classifiers: k Nearest Neighbors
The aim of this technique is to assign to an unseen point the dominant class
among its k nearest neighbors within the training set.
For BCI, these nearest neighbors are usually obtained using a metric
distance. KNN algorithms are not very popular in the BCI compmunity,
probably because
they are known to be very sensitive to the curse-of-dimensionality, which
made them fail in several BCI experiments.
However, when used in BCI systems with low-dimensional feature vectors,
kNN may prove to be efficient.
Two famous k-NN algorithms
K-means
LBG
Mahalanobis Distance-Based Classifiers
Mahalanobis distance based classifiers assume a Gaussian distribution
N(μc,Mc) for each prototype of the class c.
Then, a feature vector x is assigned to the class that corresponds to the
nearest prototype, according to the so-called Mahalanobis distance dc (x):
SVM
An SVM also uses a discriminant hyper plane to identify classes.
However, concerning SVM, the selected hyper plane is the one that
maximizes the margins, i.e., the distance from the nearest training points.
Maximizing the margins is known to increase the generalization capabilites.
an SVM uses a regularization parameter C that enables accommodation to
outliers and allows errors on the training set.
MultiLayer Perceptron
An MLP is composed of several layers of neurons: an input layer, possibly one
or several hidden layers, and an output layer.
Each neuron's input is connected with the output of the previous layer's
neurons whereas the neurons of the output layer determine the class of the
input feature vector.
A MultiLayer Perceptron without hidden layers is known as a perceptron.
Interestingly enough, a perceptron is equivalent to LDA and, as such, has
been sometimes used for BCI applications.
Multi-Layer-Perceptron structure
Other NN architechtures
Learning Vector Quantization (LVQ) Neural Network
Fuzzy ARTMAP Neural Network
Finite Impulse Response Neural Network (FIRNN)
Time-Delay Neural Network (TDNN)
Gamma dynamic Neural Network (GDNN)
RBF Neural Network
Bayesian Logistic Regression Neural Network (BLRNN)
Hidden Markov Model (1)
Hidden Markov Models (HMM) are popular dynamic classifiers in the field of
speech recognition.
Hidden Markov Model (HMM) is a statistical model in which the system being
modeled is assumed to be a Markov process with unknown parameters, and the
challenge is to determine the hidden parameters from the observable
parameters.
An HMM is a kind of probabilistic automaton that can provide the probability of
observing a given sequence of feature vectors.
Each state of the automaton can modelize the probability of observing a given
feature vector.
HMM are perfectly suitable algorithms for the classification of time series. As
EEG components used to drive BCI have specific time courses, HMM have been
applied to the classification of temporal sequences of BCI features.
Hidden Markov Model (2)
HMM parameters
A generic HMM can be expressed as λ={S,π,A,B} where
S denotes possible states
π the initial probability of the states
A the transition probability matrix between hidden states
B observation symbols’ probability from every state.
HMM for T observations
HMM-based Mental Task Classification (1)
imagery left or right hand movements
The task was to control a feedback bar by means of imagery left or right hand
movements according to the cues shown to the subject.
Servaral study shows that for imagery hand movements, 2/3 channels of data
over the motor cortex are enough to be processed.
Data is collected from 3 channels C3, C4, and Cz.
Sampling frequency is 128Hz
Filtered between 0.5 and 30 Hz.
140 trials of 9 second length.
In each trial,
the first 2s was quiet
at t = 2s an acoustic stimulus indicates the beginning of the trial, and
a cross ‘+’ was displayed for 1s; then at t = 3s an arrow (left or right)
was displayed as the cue.
At the same time the subject was asked to move a bar into the
direction of the cue. Therefore, only the period between t = 4s and t =
9s of each trial was considered for classification study.
HMM-based Mental Task Classification (2)
imagery left or right hand movements
Randomly chosen 100 trials as training- and 40 trials as test-dataset.
This process has been repeated 20 times on randomly separated training and
test data and at last the results of all these trends have been averaged to
provide the total classification accuracy percentage.
The best classification percentage showed to be for the classifier with 2 states
and 16 observable symbols/state according to second 0.5s segment of data,
which is 77.13 %.
Reference. S. Solhjoo, A. M. Nasrabadi, and M. R. H. Golpayegani. Classication of chaotic signals using hmm classiers: EEGbased mental task classication. In Proceedings of the European Signal Processing Conference, 2005.
PCA+HMM+SVM FOR EEG PATTERN CLASSIFICATION (1)
imagery left or right hand movements
First Approach
Principal component features extracted separately from C3 and C4 channels are
concatenated
Then they are fed into the corresponding HMM (which models either left-movement or
right-movement) for training.
Accuracy: 75.70%
PCA+HMM+SVM FOR EEG PATTERN CLASSIFICATION (2)
imagery left or right hand movements
Second Approach
Principal component features from each channel (C3 and C4) are fed into two HMMs
separately which results in four HMMs in total.
HMMC3L
HMMC4L
HMMC3R
HMMC4R
The SVM is employed to make a final decision from the likelihood scores computed by
HMMs.
Accuracy: 78.15%
Reference: H. Lee and S. Choi. Pca+hmm+svm for eeg pattern classication. In Proceedings of the
Seventh International Symposium on Signal Processing and Its Applications, 2003.
Classifying EEG signals based HMM-AR (2)
ICA+AR+HMM for imagery left or right hand movements
AR Model
An AR model is a linear predictor used for modeling time series.
The so-called Kalman filter AR model achieves this by having a
state which is a vector of AR coefficients.
Basic Steps
Consider 8hz to 30hz
ICA on C3 and C4 and consider first IC as source
AR
HMM for training and testing of two actions
Reference: Tang Yan, Tang Jingtian, Gong Andong, and Wang Wei. Classifying EEG signals based HMM-AR. In
Proceedings of The 2nd International Conference on Bioinformatics and Biomedical Engineering, pp. 2111-2114 ,
2008.
Conclusion

Some basic signal processing and classification techniques
are discussed over here.
Thank you