Biometrics (WP3)

Download Report

Transcript Biometrics (WP3)

SecurePhone
A Multi-Modal Biometric Verifier
for constrained devices
Sabah Jassim
University of Buckingham, UK.
BioSecure & COST 2101 – Smart Cards and Biometric – Lausanne, 2007
Outline
 The SecurePhone project
 Fusion approaches to biometric-based Identification
 SecurePhone multi-Modal Biometric verifier
• PDA Implementation Constraints
• Modalities
• Fusion strategy
 Performance: Match on Host (Moh) & Mach on Card (Moc)
 Challenges and Potential solutions
 Conclusion
BioSecure + COST 2101 - March 2007
The SecurePhone Project

Aims to produce a prototype of a new mobile communication
system enabling biometrically authenticated users to deal
legally binding m-contracts during a mobile phone call in an
easy yet highly dependable and secure way using a biometric
recogniser that fuses face, voice and handwritten
signature. The SP consortuim
BioSecure + COST 2101 - March 2007
SecurePhone aim 1: secure exchange
Secure PKI (Public Key Infrastructure)
Deal secure m-contracts during a mobile phone call
• secure: private key stored on SIM card
• user-friendly: intuitive, non-intrusive
• flexible: legally binding text/audio transactions
• dynamic: mobile e-signing “on the fly”
BioSecure + COST 2101 - March 2007
Project aim 2: biometric verification
face
voice
signature
preprocessing
preprocessing
preprocessing
modelling
modelling
modelling
fusion
client & impostor
joint-score
models
reject user
accept user
release private key
 Zero-Knowledge Authentication.
BioSecure + COST 2101 - March 2007
Implementation constraints
• PDA main processor is such slower processing power than PC.
Thus even on PDA verification must be very efficient.
• Inadequate Audio-Visual signal sample rate using the device
applications (only 8 kHz for audio and 10 fps video).
Succeeded to improved. Current SP sampling and real time
pre-processing is 22 kHz audio and 20 fps video signals.
• Only data on the SIM is secure, so must store and process the
biometric models/templates on the SIM. Yet the SIM has very
limited computational resources and processing support
SIM model storage is limited to 40 K: text-dependent prompts
Note: text-independent prompts or varied text-dependent
prompts are more secure, but would require 200-400 K.
• Enrolment should be based on a short session (acceptability)
BioSecure + COST 2101 - March 2007
Voice verification (SU / GET ENST)
• Fixed 5-digits prompt – conceptually neutral, easily
extendable, requires few Gaussians
• 22 KHz sampling
• Online energy based non-speech frame removal
• MFCCs with online CMS and first time difference features –
slow to compute, but fixed point faster than floating point
• Features modelled by 100-Gaussian GMM pdf,
with UBM for model initialisation and score normalisation
• Training on data from 2 indoor and 2 outdoor recordings
from one session. Testing on similar data from another
session
BioSecure + COST 2101 - March 2007
Signature verification (GET INT)
• 2D coordinates (100 Hz) augmented by time difference
features, curvature, etc. – total 19 features
Note: no pressure or angles available, since obtained from
PDA’s touch screen, not from writing pad
• Shift normalisation, but no rotation or scaling
• Features modelled by 100 Gaussian GMM pdf – UBM used for
model initialisation and score normalisation
• Fast to compute
• Training and testing on data from one session
BioSecure + COST 2101 - March 2007
Face Wavelet feature Representation (BU)
 The Discrete Wavelet Transform (DWT)
decomposes an image into a set of different
frequency subbands with different resolutions,
each consisting of
 At a resolution depth of k, the pyramidal scheme
decomposes an image I into 3k + 1 subbands:
(LLk, HLk, LHk, HHk, . . . , HL1, LH1, HH1).
The lowest-pass subband LLk represents the klevel resolution approximation of the image I. The
subbands HL1, LH1, and HH1 contain finest scale
wavelet coefficients, and the coefficients get
coarser as k increases, LLk being the coarsest.
 Each subband of DWT-decomposed face image
represents the person’s face at different frequency
ranges and different scales (i.e. a distinct stream for
face recognition with varying accuracy rates that can be
fused for improved accuracy).
BioSecure + COST 2101 - March 2007
Face verification (BU)
• Static face recognition – 10 grey-scale images selected at
random from a video, face area 160x192 pixels
• Histogram equalisation and z-score standardisation of features
are applied as simple fast light normalisation.
• Haar wavelet low-low-4 (or low-high) subband as feature vectors
Other wavelet filters were tested but Haar is the fastest to compute
• Features modelled by only 4 Gaussian GMM pdf – UBM used for
model initialisation and score normalisation
• Training on data from 2 indoor and 2 outdoor recordings from
one session, testing on similar data from another session
BioSecure + COST 2101 - March 2007
Fusion (GET INT)
• For each modality S(i) = log p(Xi|C) - log p(Xi|I)
• Score fusion was tested by:
• Optimal linear weighted sum:
Fused-scores =

w(i) * S(i)
sum is taken over the 3 modalities
• GMM scores modelling, i.e. modelling both client
and impostor joint score pdf’s by diagonal
covariance GMMs:
Fused-score = log p(S|C) - log p(S|I)
BioSecure + COST 2101 - March 2007
User verification system
• User requests PDA to verify their identity
79851
• PDA requests user to
• read prompt (face in box)
• sign signature
• Feature processing applied to each modality
[silence removal, histogram equalisation, MFCC or
Haar wavelets, online CMS, delta features, etc.]
• for each modality S(i)=log p(Xi|C)-log p(Xi|I)
• if S(i) < θ(i) for any (i) please repeat
else fused-score = log p(S|C) - log p(S|I)
• if fused-score > φ user accepted
else user rejected
BioSecure + COST 2101 - March 2007
start/stop
Press to start/stop speaking
Speaking face & Forgery (GET ENST)
• Investigated possible attacks and forgery scenarios:
• using synthesised voice and face
Difficult to create – synchronisation problems
• Replay attacks – devised a successful attack
whereby the client voice and face images but
not the same video.
Used coupled HMM for voice and face reduced
greatly the effect of this attack.
BioSecure + COST 2101 - March 2007
PDA Database (PDAtabase)
• After initial development with many databases [TIMIT(V), CSLU(V),
BANCA(V,F), ORL(F), BIOMET(V,F,S), NIST(V)]
• CSLU/BANCA-like database recorded on Qtek2020 PDA for realistic
conditions (sensors, environment)
• 60 English subjects: 24 for UBM, 18 for g1, 18 for g2. Accept/reject
threshold optimised on g1evaluated on g2, vice versa
• Video (voice + face): 18 prompts from (5-digit, 10-digit and phrase);
3 sessions, with 2 inside and 2 outside recordings per session
• Signatures in one session, 20 expert impostorisation for each
• Virtual couplings of audio-visual with signature data (independent)
• Automatic test script allows to test many possible configuration
• User just provides executables for feature modelling, scores generation
and scores fusion
BioSecure + COST 2101 - March 2007
Match on Host (MoH): complementarity of modalities
Modality
5 digits
10 digits
Voice (V)
6.1
3.4
Face (F)
28.6
29.9
Signature (S)
6.2
6.2
V+F
4.8
3.0
V+S
1.1
0.7
S+F
4.8
4.7
V+F+S
0.9
0.6
For LL
subband.
Already have
improved
results for LH
subband!
Result table with improved results for 5-digit and 10-digit prompts
in PDAtabase (SPIE 2006)
BioSecure + COST 2101 - March 2007
Match on Card (MoC)
Implementation of the MoH system on the SIMcard (MoC)

No problem in terms of storage

But is not feasible because of verification time
(matching plus host/SIM communication = one hour )
A reduction of the verification time can be attained by

reducing the vector size

reducing the frame rate

reducing the number of Gaussians of the client and
background models
Matching time was still not acceptable
BioSecure + COST 2101 - March 2007
MoC bottleneck

Not in preprocessing, since this is still all done on the
PDA, as in the MoH system.

Not in face:

Although feature vectors are
 Only a few (10) of them in testing
 and only 4 Gaussians needed (client model and UBM)

Bottleneck caused by voice and signature data:

Vectors are relatively small,
 large number of frames
 large number of Gaussians
BioSecure + COST 2101 - March 2007
MoC solution
Only a drastic measure can solve the problem:

Globalised features:

Features to represent the whole signature: a single vector
of 41 parameters representing correlation and variation in
x-y coordinates, velocity and acceleration parameters
 Idea generalized to voice: use of means (cf. Long-Term Average
Spectrum) and standard deviations per vector parameters across
all frames
 Works well for signature

Improvement:


use up to four equal subparts of signature/voice signal
Implementation: 2 equal subparts
BioSecure + COST 2101 - March 2007
MoC-emulated results
Global feat.
Means
only
Means
only
Means
only
Means
only
Means
+ sd
Means
+ sd
Means
+ sd
Means
+ sd
#Gauss.
1
2
4
8
1
2
4
8
Voice
22.13
21.09
20.87
21.86
20.88
19.72
17.68
18.49
Face
32.26
31.78
29.06
29.19
32.26
31.78
29.06
29.19
Signature
38.29
27.58
22.58
17.86
28.14
22.16
17.59
16.45
Fused
12.89
12.48
10.49
9.32
12.56
10.48
8.28
9.15
EER (percent) for globalised means (columns 2-5) and means plus standard
deviations (columns 6-9) for voice and sinature divided into two equal subparts
BioSecure + COST 2101 - March 2007
Solving the capacity problem
Possible options for improving performance of the SecurePhone:

Use match-on-server (MoS) - Security and privacy concern.

Implement the Biometric Recognizer and Encryption on a
chip (more costly than current solution)

Build a secure PDA with sufficient storage and processing
power (A dedicated device that would be more costly and less
ubiquitous).

Split matching (hybrid MoC/MoH) considered but not
implemented. Initial work is being done and results are
encouraging. Promising implications for security and privacy
of biometrics data (templates/models)without cryptography.
BioSecure + COST 2101 - March 2007
Conclusion and Future Work
• Natural, non-intrusive biometrics guarantee high user acceptance
• Biometric data never leave the SIM-card. High security
• Fusion of Multi-streams of single trait can lead to improved in
performance (A pilot for Face was tested but not implemented in SP)
• MoH is efficient with high accuracy, but vulnerable.
• MoC is secure, efficiency and high accuracy cannot happen together!
Future work include:

Designing hybrid mixed client-server matching.

Investigating the privacy and security of Biometric data, using
Cancellable Biometrics, specially for “Match on Server”

Improving performance of single modalities through the multiclassifier & multi-stream strategies.
e.g. Face by mixing larger number of subbands at different depths
BioSecure + COST 2101 - March 2007
Acknowledgement
Thanks to EU for funding this research through the
SecurePhone (IST-2002-506883) project.
BioSecure + COST 2101 - March 2007