Signature recognition Dr. Marina Gavrilova Topics      Definitions Feature extraction Signature Forgery Signature models Advantages and disadvantages of signature as biometrics.

Download Report

Transcript Signature recognition Dr. Marina Gavrilova Topics      Definitions Feature extraction Signature Forgery Signature models Advantages and disadvantages of signature as biometrics.

Signature recognition
Dr. Marina Gavrilova
Topics





Definitions
Feature extraction
Signature Forgery
Signature models
Advantages and disadvantages of
signature as biometrics
Signature
Signatures


Off-line or static signatures are scanned from
paper documents, where they were written in
conventional way. Off-line signature analysis can be
carried out with a scanned image of the signature
using a standard camera or scanner.
On-line or dynamic signatures are written with an
electronically instrumented device and the dynamic
information (pen tip location through time) is usually
available at high resolution, even when the pen is not
in contact with the paper.
PDA
Super PDA

The Delivery Information Acquisition Device-that Brown tablet
you sign when receiving a package -is one hell of a PDA. The
fourth version can handle personal (Bluetooth), local (802.11b),
and wide (GPRS or CDMA) area wireless networking. It also has
a dialup modem, an infrared scanner, GPS capability, 128
Mbytes of RAM, and a color screen. All UPS drivers carry the
device for capturing and transmitting parcel pickup and delivery
data.
Signature in Identity
Documents
Digitizing Tablet
Pen-Tablet System (WACOM)
Pen-Tablet System –How It
Works?


The WACOM stylus looks and feels like a pen yet contains no
batteries or magnets. Instead it takes advantage of electromagnetic resonance technology in which radio waves are sent
to the stylus and returned for position analysis.
In operation, a grid of wires below the screen alternates
between transmit and receive modes (about every 20μs):



In transmit mode, the electro-magnetic signal stimulates
ascillation in the coil-and capacitor resonant circuit in the pen
In receive mode, the energy of the resonant circuit oscillation in
the pen is detected by the antenna grid. This is then analysed to
determine position and other information including pressure
Since the grid provides the power to the pen through resonant
coupling, no batteries are required. Thus there are no
consumables that will run down and need to be replaced or that
would make the pen top-heavy.
Automatic identification

Traditional means of automatic identification:

Possession-based (credit card, smart card)


Knowledge-based (password, PIN)


Use “something that you know”
Biometrics-based (biometric identifier)


Use “something that you have”
Use something that relies on “what you are”
Signature is inherently a combination of knowledge
and biometric, the knowledge component (what is
written and how it is written) can be chosen, and
indeed changed, by the user
Pre-processing




Smoothing: the input signal from a
digitizing pen can be very jagged. The
pen used can affect the smoothness
and the size of the signature.
Segmentation: determination of the
beginning and ending of signing.
Signature beginning: first sample
where pressure information is not null
(first pen-down)
Signature ending: last pen-up.
Because few pen-ups can be found in
the signature, we have to establish a
maximum pen-up duration (e.g. 3 s).
Pre-processing
Initial point
alignment: all the
signatures have to be
aligned with respect
to the initial point (e.g.
the coordinate origin)
to make information
independent from the
position on the tablet.
Principal structure of signature
recognition systems
Dynamic Signature
Feature vectors -Template






acquisition area: 127·106 mm
pressure levels: 1024
resolution: 2540 lines/inch (100 lines/mm)
precision: +/-0.25 mm
detection height: 10 mm
sampling frequency: 100 pps (points per s)
Local and Global Features

Local features








x, y coordinates
Velocity (v)
Acceleration (A)
Trajectory angle (Θ)
Azimuth
Elevation
First derivative of feature (Δ)
Second derivative of feature (ΔΔ)
Local and Global Features

Global features








Signature length, height, weight
Total signature time
Total pen-down time
Total pen-up time
Average velocity
Maximum velocity
Minimum velocity
–etc,.
Examples of local features

Derivative functions are
approximated by second
order regressions instead of
differences between
consecutive samples. The
general expression of the
order N regression calculated
in the instant t for parameter
q is:

Then, velocity and
acceleration for parameter q
at instant t can be computed
as
Examples of local features
Forgery
Genuine
signatures
Forgery
signatures
Signatures: authentic and
skilled forgery
Features
Models in Signature
Recognition
Statistical Methods


Gaussian Mixture Model
Hidden Markov Model
Deterministic Methods


Dynamic Time Warping
Vector Quantization
Gaussian Mixture Model
(GMM)
Univariate Gaussian
distribution
Signature Recognition
Advantages





Signature is a man-made biometric where
forgery has been studied extensively
Enrollment (training) is intuitive and fast
Signature verification in general has a fast
response and low storage requirements
A signature verification is “independent” of
the native language user
Very high compression rates do not affect
shape of the signature (100-150 bytes)
Signature Recognition
Disadvantages



There is much precedence for using signature
to authenticate documents and not for
security applications
A five-dimensional pen may be needed to
arrive at the desired accuracy. This makes
the hardware costly.
Some people have palsies, while others do
not have enough fine motor coordination to
write consistently
Voice recognition
Dr. Marina Gavrilova
Topics




Definitions
Feature extraction
Speaker verification methodology
Advantages and disadvantages of voice
as biometrics
Introduction
Early automatic speaker verification and identification in 1960s:

the emergence of various pattern recognition techniques

statistical methods

exploration of features and feature selection methods.
However, theories were not well supported by
experimental evidence and the selected features were insufficient
With advent of more powerful computer resources and larger and
better annotated devices such as the Switchboard Corpus (UK)
there has been steady progress over the years, using more
sophisticated statistical models and training methods.
Speaker Verification Problem
The problem of speaker verification is that of recognizing the
identity of speakers from their voices.

The basic assumption is that the voice production apparatus
singularly characterizes the acoustic waves and emanations that
we interpret as words and sounds with the identity of the
speaker.

In order to establish an identity from these acoustic events, we
need a model that characterizes a speaker's voice.

We next verify an utterance from an enrolled or target speaker
(a speaker for which we have training data) by evaluating
speech with the speaker model.

If the collected speech data fits the model, then we will affirm
the speech as having come from our enrollee or, if not, reject
the claim.
Speaker Verification Problem
The choice for the representation of speech defines a feature space.
The training data fills up the feature space with feature vectors.
When we are given a speech utterance that is asserted to be from a
particular enrolled speaker, the question becomes that of determining
whether the collection of features generated by the utterance in
question "matches" the training data.
The speaker verification problem can be formulated as one in which we
ask either:
1. The test data matches the training data from the target speaker
2. The test data more closely matches the data from other speakers
Speaker Verification Methods
The theoretical justification is a consequence of Bayes' Theorem.
That is, if we want to determine whether our observation x was
generated by target speaker T, we need to compute P(T|x), the
probability of speaker T having produced the utterance that
generated the observed features x. From Bayes' Theorem, we can
write
pP(T|x) = PT(x)P(T) / p(x)
Where PT(x) is the probability of observing features x given using
the model for speaker T, P(T) is the prior probability that speaker T
has spoken, and p(x) is the probability of observing features x
irrespective of who was speaking.
Speaker Verification Methods
Figure on the right illustrates that
the target speaker fills up part
of feature space and other
speakers (often called the
cohort speakers or the universal
background speakers) will
overlap the target speaker.
The different regions of feature
space correspond to different
phonetic units and we may find
that two speakers overlap in
one part of space while they do
not overlap very much in
another.
An illustration of feature space for
speaker verification.
Threshold Estimates
The setting of the threshold determines the size of the errors that
the verification system will make. There are two types of error to
be concerned with:
1. The error of missing a target, PM, when the target speaker is rejected
incorrectly.
2. The error of false acceptance or false alarm, PFA, when an impostor speaker is
accepted as the target speaker.
The adjustment of the threshold enables the trade-off between the
two types of error. Increasing the threshold will reduce the
frequency of false acceptances by demanding an increase in the
difference in the scores of the target and the alternative models.
Threshold Estimates
The price now paid for such a change in threshold is that it is now
more difficult for acceptance to occur when the target is actually
present, resulting in an increase in the false dismissals.
This trade- off between the two errors is characterized by a ROC (Receiver
Operating Characteristic) curve which is a plot of the probability of
correct acceptance, which is 1- PM versus PFA.
Quite often a variant of the ROC is employed: the DET curve (the
detection error trade-off curve). It allows for a more direct
comparison of errors by plotting PM versus PFA . This is done
on a logarithmic scale for easier reading of the smaller error
region.
Threshold Estimates
Example of a DET curve based on scores from multiple speakers.
Feature selection
It is in the selection of the features that we encounter the
distinctive characteristics of the speech process in defining
this speaker verification pattern recognition problem.
At the heart of the problem is to distinguish one person's speech
production mechanism from another person's.
Given that speech is considered to be a time-varying linear filter
(our vocal tract) operating on signal sources (glottal pulses or
noise generators), a reasonable feature set would characterize
the signal produced by the filter acting on the signal source.
The time-varying character of the speech process is captured by
performing the spectral analysis at short time intervals and
repeating the analysis periodically.
Feature selection
Features are the Mel-frequency Cepstral Coefficients, usually referred to as MFCCs.
These speech features seem particularly well suited to density function models
such as mixture models or Hidden Markov Models (HMMs).
The Mel-frequency represents a warping of the frequency band that is linear from 0
to 1 kHz and logarithmic at the higher frequencies, and the analysis is performed
over this warped axis. This warping function has its origins in the human
perception of tones.
The speech signal is constrained to a bandwidth of about 150-3500 Hz for telephone
speech and about 50 Hz - 8kHz for broadband communications.
The cepstral features are obtained by analyzing the Mel-scaled speech every 0.01
seconds over a time interval of about 20 milliseconds in duration.
This yields the Mel-scale spectrum. Depending on the system there are about 12-20
such coefficients produced every 0.01 seconds. These coefficients, MFCCs, are
used to generate cepstral derivatives by differencing techniques, resulting in
about 24 to 40 features and sometimes second derivatives.
Speaker Models
Speaker models enable to generate the scores from which we
will make decisions. As in any pattern recognition problem the
choices are numerous, and include the likes of neural networks,
Support Vector Machines, Gaussian Mixture Models (GMMs)
and Hidden Markov Models (HMMs), as well as combinations of the
various approaches.
While some models are capable of exploiting dependency in the
sequence of feature vectors from the target speakers, the current
state-of-the-art modeling approaches, based on Gaussian Mixture
Models, treat the sequence of feature vectors as independent
random vectors.
The Gaussian Mixture Model
At the heart of today's state-of-the-art speaker verification systems
is the Gaussian Mixture Model (GMM). This is a probability density
function (PDF) that itself consists of a sum of multivariate Gaussian
density functions. Being a mixture of Gaussians, it has the ability to
place mass where collections of training data exist.
Additionally, its parameters can be readily estimated by Maximum
Likelihood (ML) training procedures including the EM (Estimate-
Maximize) algorithm, a well-known iterative algorithm for finding
the ML estimate.
Speaker Model Training
The process of creating speaker models via adaptation starts with a
generic model for a speaker and uses the data collected from the
target speaker to tailor the generic model to the speaker. If we
consider the generic model as a prior probability distribution we
can view the process of model adaptation in the Bayesian
framework of parameter estimation.
Speaker Model Training
The discriminative training of models is concerned with the
important and relevant question of "How should I select the
parameters of my model or models such that we maximize the
performance on our goal of speaker separability?"
In order to accomplish this we need to move away from the ML
criterion and employ as a training criterion the actual loss function
that will be employed in the evaluation of an SV system or some
other criterion that is highly correlated with our performance
metric.
Speaker Model Training
Class conditional models that have discriminative training imposed on them via the
discriminative training criterion form one class of discriminatively trained
models.
The other class is that for which Discrimination is inherent in the approach: neural
networks, Support Vector Machines (SVM), and classification trees.
What these approaches have in common is that the training process involves the
separation of two or more classes from each other (e.g. the target speaker from
the non-target speaker).
Model selection
The selection of the appropriate model for a particular application is still an art.
GMMs have shown themselves to be adaptable to a wide variety of situations and
should be one of the contenders in almost any application.
However, any of the discriminative approaches, such as neural networks, offer the
possibility of being more effective in situations with limited training data, since
they use all the training data to find the decision boundary between classes
rather than modeling classes separately and then combining the separate
models.
Performance measures
We have already noted that we can measure the performance of a
speaker verification system by its probability of a false dismissal
versus the probability of false acceptance at a given threshold
setting.
As we change the system's detection threshold,
we end up with a collection of values that give the DET (Detection
Error Trade-off) curve. While this is a fairly straightforward
concept, how one should proceed from the collection of scores for
individuals to the system DET curve is not uniquely defined.
Performance measures
For any individual, since there are only a
few target speaker scores, a DET curve
will be highly discontinuous.
In this figure, a DET curve for a single
target speaker has a small number of
target test samples and significantly
more impostor scores.
One approach that is used by the National
Institute of Standards and Technology
is to treat all the individual scores from
the Individual speakers as if they were
from one speaker and create the
composite DET curve.
This requires that a great emphasis be
placed on normalizing scores in order
to avoid severe degradations in
measured performance due to the
incommensurate scores from different
target speakers.
Example of a DET curve for a single speaker.
Performance measures
While the DET or ROC curves provide a great deal of information
about SV system performance, it is often desirable to characterize
the performance of a system by a single number in order to
facilitate system comparisons. The number often employed is the
equal error rate (ERR), the point on the DET or ROC where the
probability of a miss is equal to the probability of false acceptance.
The factors affecting performance include the amounts of training
data and the duration of the evaluation data, the variability and
types of channels used etc.
Alternative approaches
The approaches that we have considered thus far have treated the data extracted from
the speech as independent events in time, and have not exploited any of the higher
level information in the speech process, relying primarily on the general
characteristics of an individual's vocal production mechanism.
On contrary, the Dragon Systems approach focused on the recognizer's ability to
provide higher-level acoustic knowledge (context-independent phonetic units).
Words (and Phonetic Units) Count: The natural extension of the proceeding work
was to combine language information with acoustic information.
Models Exploring the Shape of Feature Space: This model ignores the fine structure
of the speech, working with the shape that a speaker's features form in feature
space.
Voice Recognition
Advantages




Voice is a physiological and behavioral
characteristic where forgery has been
studied extensively
Enrollment (training) is relatively fast
Inexpensive to obtain
Provides vast grounds for physiological
and psychological studies
Voice Recognition
Disadvantages



A voice verification is highly dependent
on the native language user
Very high compression rates are not
easy to achieve
A majority of research is on automated
voice recognition, and not for biometric
applications.
Reference and Links


Signal Processing Institute, Swiss
Federal Institute of Technology
http://scgwww.epfl.ch/
Biometric Systems Lab, University of
Bologna
http://bias.csr.unibo.it/research/biolab/

Course text book