Presentation title

Download Report

Transcript Presentation title

Secure contracts signed by mobile Phone
IST-2002-506883
Multi-modal Biometric Verification
for Small and Very Small Devices
Jacques Koreman, NTNU
Andrew Morris, Spinvox
International Workshop on
Verbal and Nonverbal Communiation Behaviours
Vietri sul Mare, 29-31 March 2007
Overview
•
•
Background and application: SecurePhone
Multimodal biometric recognition
– face, voice, signature: natural
•
For small devices: PDA
– Good performance, short verification time
– Security problem
•
For very small devices: SIM card
– Global features to run on slow CPU
– Short verification time, acceptable performance
•
Conclusion
– Further improvements by glottal feature fusion?
– Relevance for COST2102
Int’l Workshop on Verbal and Nonverbal Communication Behaviours, Vietri sul Mare, 29-31 March 2007, slide 2
Background: SecurePhone project
•
•
Duration: 01.01.2004 – 30.11.2006
•
SecurePhone consortium:
Aim: “a mobile phone with biometric authentication
and e-signature support for dealing secure transactions
on the fly ”
– Management
– Research
– Implementation
– Exploitation
Financing:
EU 6th framework IST
Int’l Workshop on Verbal and Nonverbal Communication Behaviours, Vietri sul Mare, 29-31 March 2007, slide 3
video camera
data
capture
touch
screen
data
capture
SecurePhone
e-signature
manager
PIN
biometric
number
recogniser
SIM
card
GPRS/UMTS
data
capture
biometric
preprocessor
microphone
Int’l Workshop on Verbal and Nonverbal Communication Behaviours, Vietri sul Mare, 29-31 March 2007, slide 4
Multimodal biometric recogniser
Face
Voice
Haar
LL4 wavelets
Later: HL4
MFCCs
LL HL
GMM
geometric features
GMM
LH HH
user profile
Signature
GMM
“biometric recogniser”
accept user
release private key
world model
reject user
Int’l Workshop on Verbal and Nonverbal Communication Behaviours, Vietri sul Mare, 29-31 March 2007, slide 5
PDA: fusion results for PDAtabase
Modality
5-digit
10-digit
Phrase
Voice
7.21
3.24
5.54
Face
28.40
27.55
28.33
Signature
8.01
Fusion (mean)
2.39
Fusion (sd)
0.96
Marcos Faundez-Zanuy:
1.54
Face recognition:
an unsolved
problem
0.83
2.30
1.85
DET curves/result table for 5-digit (left), 10-digit (middle) and phrase prompts (right)
Int’l Workshop on Verbal and Nonverbal Communication Behaviours, Vietri sul Mare, 29-31 March 2007, slide 6
From small to very small devices: problem
•
Biometric data cannot be stored or processed on the
PDA, because impostors could steal biometric data.
•
Therefore storage and processing must be on SIMcard,
which self-destroys when tampered with physically.
•
Instead of a few seconds on the PDA, verification on
the SIMcard takes one hour!
• Bottleneck: large number of comparisons in voice and
signature verification (for client model and UBM)
– for large number of frames per prompt
– for large number of Gaussian mixtures in GMM
Int’l Workshop on Verbal and Nonverbal Communication Behaviours, Vietri sul Mare, 29-31 March 2007, slide 7
From small to very small devices: solution
•
Marcos Faundez-Zanuy:
Reducing the frame rate or the number
of mind:
GMMsometimes a
Open your
simple
solution
mixtures cannot reduce the processing
time
in acan give a
good result“ (and sometimes
sufficient order of magnitude
you cannot get around it)
• Drastic solution: globalised features (idea taken from
static signature representations)
– Means (cf. Long-Term Average Spectrum for voice)
and standard deviations per vector parameter across
all frames; also greatly reduced number of Gaussians
required for modelling the vectors
– To counteract the effect of averaging, compute
globalised features for subparts of the signal
Int’l Workshop on Verbal and Nonverbal Communication Behaviours, Vietri sul Mare, 29-31 March 2007, slide 8
PDA results
Global feat.
Means
only
Means
only
Means
only
Means
only
Means
+ sd
Means
+ sd
Means
+ sd
Means
+ sd
#Gauss.
1
2
4
8
1
2
4
8
Voice
28.20
30.08
30.36
32.08
22.78
22.55
24.41
25.71
Face
32.26
31.78
29.06
29.19
32.26
31.78
29.06
29.19
Signature
37.26
29.28
27.15
26.25
28.34
26.60
21.27
19.21
fused
17.95
17.16
14.83
15.01
13.68
12.35
10.05
10.31
EER (percent) for globalised means (columns 2-5) and means plus standard
deviations (columns 6-9)
Int’l Workshop on Verbal and Nonverbal Communication Behaviours, Vietri sul Mare, 29-31 March 2007, slide 9
SIM card results
Global feat.
Means
only
Means
only
Means
only
Means
only
Means
+ sd
Means
+ sd
Means
+ sd
Means
+ sd
#Gauss.
1
2
4
8
1
2
4
8
Voice
22.13
21.09
20.87
21.86
20.88
19.72
17.68
18.49
Face
32.26
31.78
29.06
29.19
32.26
31.78
29.06
29.19
Signature
38.29
27.58
22.58
17.86
28.14
22.16
17.59
16.45
Fused
12.89
12.48
10.49
9.32
12.56
10.48
8.28
9.15
EER (percent) for globalised means (columns 2-5) and means plus standard
deviations (columns 6-9) for voice and signature divided into two equal subparts
Int’l Workshop on Verbal and Nonverbal Communication Behaviours, Vietri sul Mare, 29-31 March 2007, slide 10
Improvement needed
•
Performance drop:
– PDA EER 2.39% (meanwhile improved to 0.9%)
– SIM EER 10.05% (8.28 for two equal subparts)
•
Performance can be improved if we do not restrain the
GMM models to be the same across all modalities
• Otherwise: Use of complementary features within a
modality
– Face: simple face geometric variables
– Voice: parameter values of LF model fitted to glottal flow
derivative, obtained from inverse filtering of mic signal
Int’l Workshop on Verbal and Nonverbal Communication Behaviours, Vietri sul Mare, 29-31 March 2007, slide 11
Interest to this COST action
•
Interest in glottal flow derivative for speaker recognition
stems from
– expected complementarity to MFCC representation of
spectrum
– applicability in applications which use very little training
data (as in SecurePhone, for user-friendliness)
• But can also be useful for other classification problems,
like “the recognition of emotional states, gesture, speech
and facial expressions, in anticipation of the
implementation of useful application such as intelligent
avatars and interactive dialog systems”
(quote from aims website of this workshop)
Int’l Workshop on Verbal and Nonverbal Communication Behaviours, Vietri sul Mare, 29-31 March 2007, slide 12
Last night’s addendum: speech & gestures
•
Source signal parameters can also be used together with
other spectral parameters as well as F0, duration,
loudness measures to signal prominence.
• In speech, these signals can be used differently across
languages (syllable-timed vs. stress-timed) and speakers
(German Research Council “rhythm project” led by
Bill Barry, Saarland University, to which NTNU contributes
with Norwegian database recordings and analyses).
•
Prominence also signalled by extent/size as well as
acceleration of gestures.
•
In how far do gestures and speech signal parameters
correlate? When are they used as complementary/
alternative strategies for signalling prominence?
Int’l Workshop on Verbal and Nonverbal Communication Behaviours, Vietri sul Mare, 29-31 March 2007, slide 13
Thank you for your attention.
Int’l Workshop on Verbal and Nonverbal Communication Behaviours, Vietri sul Mare, 29-31 March 2007, slide 14