Transcript Part1
Introduction
C.V.
Juan Arturo Nolazco-Flores Associate Professor Computer Science Department, ITESM, campus Monterrey, México Courses: – Speech Processing, Computer Networks.
Ph.D. in Speech Recognition M.Phil. in Speech and Language Processing.
M.Sc. In Control Engineering B.Sc. in Electronic Systems
Useful Information
E-mail: [email protected]
.
Office: VII-426 Phone: 83-582000, ext. 4535, subext. 114
Plan of Work
Fundamentals of Speech Science (8 hours) – Speech System Production – Acoustic-Phonetic Characterisation Modelling Speech Production Signal Processing and Analysis Methods (8 hours) – Bank-of-filters – Windowing – LPC – Cepstral Coefficients – Vector Quantization
Plan of work
Speech Recognition (12 hours) – Distances – Time Alignment and Normalisation – DTW – Discrete HMM – Continuos HMM Speech Coding
Speech Recognition Chapter 1(Rabiner & Juang)
Introduction
What is speech recognition ?
– It is the identification of words in an utterance ( speech-> orthographic transcription ).
– Based on pattern matching techniques.
– Knowledge learn from data, usually using a stochastic techniques.
– It uses powerful algorithms to optimise a mathematical model for a given task.
Notes
Do not confuse with
speech understanding
, which is the identification of the utterance meaning.
Do not confuse with
speaker recognition
, which is the identification of a speaker in a set of speakers.
– Main Problem: The speaker do not want to be recognised.
Do not confuse with
speaker verification
, which verifies if a speaker is the one he (she) say he (she) is.
– Main Problem: The speaker can have a pharyngeal problem.
Word Speech Recognition
User Speech Word Recognition System Set of valid words.
Syntax
Model for Speech Understanding
User Speech Word Recognition Model Higher Level Processing Voice Output Syntax, Semantics Pragmatics Task Description
Speech Recognition Disciplines
Signal Processing: Spectral analysis.
Physics (Acoustics): Human Hearing studies. Pattern Recognition: Data clustering Communication and Information Theory: statistical models, Viterbi algorithms, etc.
Linguistics: grammar and language parsing.
Physiology: knowledge based systems.
Computer Science: efficient algorithms, UNIX, c language.
History (50’s)
Speaker Dependent Isolated Digit Recognition System (Bell Labs, 1952).
Phone recogniser (4 vowels and 9 consonants) (UCL, 1959).
– Statistical recognition Speaker Independent 10 vowels recognition (MIT, 1959).
History (60’s)
Hardware vowel recogniser (Radio Research Lab. in Tokyo, 1960).
Hardware phoneme recogniser (Kyoto University, 1962).
Realistic solution to the problem of
nonuniformity
of time scales in speech events (RCA Labs. 1964).
DTW (Soviet Union, 1968). Re discovered in the 80’s in the west.
Continuous Tracking of phonemes (CMU,1966).
History (70’s)
Research effort in isolated word Recognition. Dynamic programming methods successfully applied in Speech Recognition.
Uses of LPC in Speech Recognition.
Start work on Independent Speaker Speech Recognition.
History (80’s)
Research effort in Connect word Recognition.
Template based approach to statistical modelling methods (specially HMMs).
Applications of Neural Networks to Speech Recognition.
Large impetus to Large vocabulary speech recognition, continuous speech recognition.
DARPA (Defence Advanced Research Projects Agency) project, which sponsored a large research program to obtain a high recognition performance for a 1000-word database.
History (90’s)
DARPA project Emphasis in natural language.
– Spontaneous Speech Speech-technology used in within telephone networks.
Why is it difficult?
Speech is a complex combination of information from different levels that is used to convey an information.
Signal variability: – Intra-speaker variablity emotional state, environment (Lombard effect) – Inter-speaker variablity physiological differences, accent, dialect, etc.
– Acoustic channel Telephone channel, background noise/speech, etc.
Task classification
Mode of speaking
Isolated word Connect-word Continuous
Speaker set
Speaker Dependent
Environment
noise free Multi-speaker Independent office telephone
Vocabulary
small (<50) medium (<500) large (<5000) high noise very large (>5000)