Keyboard Acoustics Emanations Revisited Sample Collector Motivation

Transcript Keyboard Acoustics Emanations Revisited Sample Collector Motivation

Keyboard Acoustics Emanations Revisited
Li Zhuang, Feng Zhou, J. D. Tygar, {zl,zf,tygar}@cs.berkeley.edu, University of California, Berkeley
Motivation
Initial training
•Emanations of electronic devices leak information
•How much information is leaked by emanations?
•Apply statistical learning methods to security
•What is learned from sound of typing on a keyboard?
Subsequent recognition
wave signal
Feature Extraction
Feature Extraction
Unsupervised Learning
Keystroke Classifier
(use trained classifiers for
each key to recognize
sound samples)
Sample Collector
Requirement
Analogy in
Crypto
Feature
Extraction
Initial training
Language Model
Feedback-based
Training
Asonov and Agrawal
(SSP’04)
Text-labeling
Known-plaintext
attack
FFT
Feature Extraction
Ours
Direct recovery
Known-ciphertext
attack
Cepstrum
Supervised learning
Clustering (K-means,
with Neural Networks Gaussian), EM algorithm
/
/
Recovered keystrokes
keystroke classifier
recovered keystrokes
•Frequency information in sound of each typed key
•Why do keystrokes make different sounds?
•Different locations on the supporting plate
•Each key is slightly different
HMMs at different
levels
Self-improving
feedback
Key Observation
•Build acoustic model for keyboard & typist
•Non-random typed text (English)
•Limited number of words
•Limited letter sequences (spelling)
•Limited word sequences (grammar)
•Build language model
•Statistical learning theory
•Natural language processing
Before spelling
and grammar
correction
After spelling
and grammar
correction
Language Model Correction
Classifier Builder
Acoustic Information:
Previous and Ours
Sample Collector
wave signal
Language Model Correction
Alice
password
http://redtea.cs.berkeley.edu/~zl/keyboard
•How to represent a keystroke?
•Vector of features:
•FFT, Cepstrum
•Cepstrum features is better
•Also used in speech recognition
Unsupervised Learning
•Group keystrokes into N clusters
•Assign keystroke a label, 1, …, N
•Find best mapping from cluster labels to characters
•Some character combinations are more common
•“th” vs. “tj”
•Hidden Markov Models (HMMs)
“t”
“h”
“e”
5
11
2
Language Model Correction
Feedback-based Training
•Feedback for more rounds of training
•Output: keystroke classifier
•Language independent
•Can be used to recognize random sequence of keys
•E.g. passwords
•Representation of keystroke classifier
•Neural networks, linear classification, Gaussian mixtures
Some Experiment Results
4 date sets (12~27mins of recordings)
Set 1 (%)
Set 2 (%)
Set 3 (%)
Set 4 (%)
Word Char Word Char Word Char Word Char
Initial
35
76
39
80
32
73
23
68
Final
90
96
89
96
83
95
80
92
3 different models of keyboards (12mins recording)
Keyboard 1 (%)
Keyboard 2 (%)
Keyboard 3 (%)
Word
Char
Word
Char
Word
Char
Initial
31
72
20
62
23
64
Final
82
93
82
94
75
90
3 different supervised learning methods in feedback
100
90
80
70
60
50
40
30
20
10
0
NN
LC
GM
Word
Char
4/26/2006

Keyboard Acoustics Emanations Revisited Sample Collector Motivation

Transcript Keyboard Acoustics Emanations Revisited Sample Collector Motivation

Directory