Aucun titre de diapositive - telecom

Download Report

Transcript Aucun titre de diapositive - telecom

Signature with Text-Dependent
and Text-Independent Speech for
Robust Identity Verification
B. Ly-Van*, R. Blouet**, S. Renouard**
S. Garcia-Salicetti*, B. Dorizzi*, G. Chollet**
* GET/INT, dept EPH, 9 rue Charles Fourier, 91011 EVRY France;
**GET/ENST, Lab. CNRS-LTCI, 46 rue Barrault, 75634 Paris
Emails: {Bao.Ly_van, Sonia.Salicetti, Bernadette.dorizzi}@int-evry.fr;
{Blouet, Renouard, Chollet}@tsi.enst.fr
Outline
• Introduction: Why Speech and Signature?
• BIOMET database: brief description
– Signature data
– Speech data
•
•
•
•
Writer verification systems
Speaker verification systems
Fusion systems
Results and Conclusions
Introduction
• Multimodality in order to improve
biometric authentication
• Two well accepted, non intrusive
modalities: speech and signature
• Easy to implement on mobile devices such
as PDA or mobile phones
• Verification systems were available in our
respective teams
The BIOMET Database
• 5 modalities: hand-shape, fingerprints, on-line
signatures, talking faces (video with digits and
sentences)
• 131 persons: 50% male, 50% female
• Data from 68 persons for fusion (the rest of the persons
was used for building a world model for speech verification purpose)
• Time variability: two sessions spaced of 5 months
– S. Garcia-Salicetti, C. Beumier, G. Chollet, B. Dorizzi, J. LerouxLes Jardins, J. Lunter, Y. Ni, D. Petrovska-Delacretaz, "BIOMET:
a Multimodal Person Authentication Database Including Face,
Voice, Fingerprint, Hand and Signature Modalities", 4th
International Conference on Audio and Video-Based Biometric
Person Authentication, 2003.
Signature capture
• Captured on a digitizer : 200 Hz
– WACOM Intuos2 A6
• 5 parameters:
– Coordinates
– Axial pressure
– Azimuth and Altitude
Azimuth (0°-359°)
Altitude (0°-90°)
0°
270°
• 15 genuine trials per person
• 12 impostor trials per person
180°
90°
Signature modeling
• Preprocessing (filtering)
• Feature extraction: 12 parameters
• Modeling signature: continuous HMM
– 2 states, 3 gaussians per state
– Bagging techniques: 10 models to build an
«aggregated» model (average score)
– Training: 10 signatures of one session
• Normalized score: |Si(O) - Si*|
Speech
• Two verification systems:
– Data:
• Text-dependent: only a sequence of 4 digits among
the 10 available digits (5 templates per speaker)
• Text-independent: sentences extracted from the
original data:
– client model: trained on digits (15 seconds) and tested on
sentences
– world model: trained on all the data available from 53
persons (131-68 people)
– Methods:
• Text-dependent: DTW (Dynamic Time Warping)
• Text-independent: GMM (Gaussian Mixture Model)
Text-dependent (DTW)
• The DTW algorithm computes the spectral distance between two
template patterns
DTW Score
Template
speech signal
Sample speech
signal
Text-independent (GMM)
Front-end
Front-end
GMM
MODELING
GMM model
adaptation
WORLD
GMM
MODEL
TARGET
GMM
MODEL
Baseline GMM method

HYPOTH.
TARGET
GMM MOD.
Front-end
P(x/)


WORLD
GMM
MODEL
Log[
P(x/)
=
P(x/)
]
P(x/)
Fusion systems
• Additive Tree Classifier (ATC)
– Boosting techniques on binary trees
independently trained with the CART algorithm
• Support Vector Machine (SVM)
– Linear kernel
• Inputs:
– Normalized signature score
– Text-dependent LLR score
– Text-independent LLR score
Tree-based Approach for score fusion
 Goal: finding an optimal partition R = {Rk}1k  K of
the score space S=(s1, s2, s3) according to an
Information Theory criterion

a sub-optimal solution, based on CART:


Best partition : R* = arg minR C(R)
Score estimation based on P(client|Rk) and P(world|Rk)
at each leaf of a given tree
Use of RealAdaboost to build 50 trees per client and
to obtain a robust estimation of P(client|Rk) and
P(world|Rk)
Verification based on ATC
 A score S=(s1, s2, s3) is presented to the system
composed of 50 trees :

each tree gives as output a decision score, based on
the corresponding region Rk


the LLR score is computed with
P(client|Rk) and P(world|Rk)
an average score is then computed with the 50 scores
SVM principles
H
X
y(X)
Class(X)
Ho
Fusion experiments
• The 68 people database: split in 2 equal
parts
– 34 people: Fusion Learning Base (and threshold
estimation for unimodal systems with the
criterion min TE)
– 34 people: Fusion Test Base (and test of
unimodal systems)
• Per person:
– 5 genuine bimodal values
– 12 impostor bimodal values
Fusion Performances
Speech
without
noise
Speech
SNR :
10dB
Speech
SNR :
Odb
Model
Signature
TI Speech
TD Speech
ATC
SVM
TI Speech
TD Speech
ATC
SVM
TI Speech
TD Speech
ATC
SVM
TE (%)
11.9 [±2.7]
6.3 [±2.0]
10.3 [±2.6]
2.8 [±1.4]
2.7 [±1.4]
8.0 [±2.3]
11.9 [±2.7]
2.9 [±1.4]
2.9 [±1.4]
17.0 [±3.1]
16.5 [±3.1]
6.7 [±2.1]
5.8 [±2.0]
FA (%)
8.9 [±2.9]
2.0 [±1.4]
7.6 [±2.7]
1.7 [±1.3]
1.3 [±1.1]
2.0 [±1.4]
7.8 [±2.7]
2.5 [±1.6]
1.9 [±1.4]
6.0 [±2.4]
6.3 [±2.4]
4.7 [±2.1]
2.4 [±1.5]
FR (%)
20.1 [±6.0]
16.0 [±5.5]
17.0 [±5.7
5.2 [±3.3]
5.9 [±3.6]
23.2 [±6.4]
22.1 [±6.3]
3.9 [±2.9]
5.3 [±3.4]
45.0 [±7.5]
42.0 [±7.4]
11.2 [±4.8]
13.6 [±5.2]
Conclusions
• Equivalent results of ATC and SVM:
– role of Boosting (ATC)
• Fusion increases performance by a factor 2
relatively to the best unimodal system (in clear or
noisy environments)
• Other methods to create noisy environments
should be tested (not gaussian white noise but real
one !)
• Fusion performances should also be studied only
on the 2 speech verification systems, since no
noise was introduced in the signature modality