Kein Folientitel
Download
Report
Transcript Kein Folientitel
Neuro-IT Roadmap: Successful in the Physical World
• Robust perception
• Image processing
• Speech recognition
• Multimodal human machine interaction
• System integration
• Scene analysis and representation
Dr. Werner Hemmert, CPR ST
2003-12-02 Page 1
Automotive: Overtake-Checker and Door-Opener Assistant
Im a g e
Lane
L a n e -ba se d
tra n sform a tio n
b
C o n to u r
e xtra ctio n
M o tio n e stim a tio n
a lo ng co n tou rs
c
d
T e m p o ra l fe ed b a ck
T e m p o ra lly sta bilize d
m o tio n se g m e n ta tio n
V e h icle de te ctio n
Dr. Werner Hemmert, CPR ST
2003-12-02 Page 2
V e h icle s
a
e
f
Dr. Axel Techmer
Infineon Technologies
Security: Face Detection & Recognition
Leading edge approach of face
detection (University of Bochum)
Detection of face regions (a)
Pre-selecting of frontal faces (b)
Face recognition (c,d)
Elastic graph matching
Gabor Wavelet Transform
Dr. Werner Hemmert, CPR ST
2003-12-02 Page 3
a
b
c
d
Ruhr University Bochum
Vision Instruction Processor (VIP)
Dr. Werner Hemmert, CPR ST
2003-12-02 Page 4
Infineon Technologies, Corporate Research, Systems Technology
Vision Instruction Processor (VIP)
Prototype available since May 2001:
SIMD - Architecture
204 instructions
10 Million logic transistors
On-chip memory: 37KB
Technology: 0.35µm
Clock: 100 MHz
Power consumption:
100µW/MOPS
Die size: 22mm x 23mm
Peak Performance: 53 GOPS
16 parallel
Processing
Elements
PCI-Board with VIP and camera
submodules
Software Tools for VIP:
in 0.13µm CMOS Technology:
Dr. Werner Hemmert, CPR ST
2003-12-02 Page 5
Compiler, Debugger, Profiler
Software Tools on Host:
Clock: 200 MHz
Peak Perf.: 106 GOPS
Die Size: 70 mm²
Power Consump.: 700 mW
MS Visual C++ with VPL++-Library
Application demonstrators
Car Vision, Face recognition,
MPEG2, Graphic
Infineon Technologies, Corporate Research, Systems Technology
Car Vision Components - Hardware
other
sensors
Vehicle
control
CPU
other
sensor
s
Dr. Werner Hemmert, CPR ST
2003-12-02 Page 6
Dr. Axel Techmer
Infineon Technologies
Neuro-IT Roadmap: Successful in the Physical World
Robust perception
Image processing
• Speech recognition
• Multimodal human machine interaction
• System integration
• Scene analysis and representation
Dr. Werner Hemmert, CPR ST
2003-12-02 Page 7
Classical Sound Processing for Speech Recognition
Microphone
100 frequencies
Filter
A/D
8 kHz
| FFT |
25 ms
window
every
10 ms
LOG
&
threshold
40 Hz
24 channels
Mel
transformation
12 components
smoothed
Cepstrum
&
loudness
normalized
80 Hz
Features
components
160 Hz
d/dt
.
.
.
2 kHz
.
.
.
.
.
.
.
.
.
first
derivatives
d/dt
second
derivatives
4 kHz
Dr. Werner Hemmert, CPR ST
2003-12-02 Page 8
36 features
Hidden
Markov
Model
Speech production: time waveform
Dr. Werner Hemmert, CPR ST
2003-12-02 Page 9
|FFT| resolves neither frequency nor temporal structure
20 ms window
Dr. Werner Hemmert, CPR ST
2003-12-02 Page 10
|FFT|
• frequency resolution: 50 Hz
• temporal resolution: 20 ms
Classical Sound Processing for Speech Recognition
Microphone
100 frequencies
Filter
A/D
8 kHz
| FFT |
25 ms
window
every
10 ms
LOG
&
threshold
40 Hz
24 channels
Mel
transformation
12 components
smoothed
Cepstrum
&
loudness
normalized
80 Hz
36 features
Features
Hidden
Markov
Model
components
160 Hz
d/dt
.
.
.
2 kHz
.
.
.
.
.
.
.
.
.
first
derivatives
d/dt
second
derivatives
4 kHz
time structure of speech signal (<20 ms)
is lost in the magnitude spectrum (|FFT|)
Dr. Werner Hemmert, CPR ST
2003-12-02 Page 11
Humans extract both temporal- and spectral
information for robust speech recognition
Auditory Sound Processing
sound
signal
Dr. Werner Hemmert, CPR ST
2003-12-02 Page 12
ear
canal
middle
ear
Auditory Sound Processing
100µm
sound
signal
Dr. Werner Hemmert, CPR ST
2003-12-02 Page 13
ear
canal
middle
ear
inner ear
hydrodynamics
Dynamic Compression in the Inner Ear
Inner ear model responses to 1 kHz tones
BW
-6
level (dBSPL)
-7
10
-8
10
120
100
80
60
40
20
0
speech
range
speech
range
BM displacement (m)
10
-9
10
-10
10
Dr. Werner Hemmert, CPR ST
2003-12-02 Page 14
0
basal
5
25
20
15
10
cochlear location (mm)
30
35
apical
Auditory Sound Processing
sound
signal
Dr. Werner Hemmert, CPR ST
2003-12-02 Page 15
ear
canal
middle
ear
inner ear
hydrodynamics
sensory
cell
synaptic
mechanisms
Coding of Sound into Action Potentials
regular firing pattern (Dt=10 ms f0=100 Hz)
cochlear location (mm)
frequency
high
5
10
15
F3
20
F2
25
F1
30
low
Dr. Werner Hemmert, CPR ST
2003-12-02 Page 16
F0
0
20
40
60
time( ms)
80
100
Spectral- and Temporal Sound Processing in the Auditory Pathway
Dr. Werner Hemmert, CPR ST
2003-12-02 Page 17
Neuro-IT Roadmap: Successful in the Physical World
Robust perception
Image processing
Speech recognition
• Multimodal human machine interaction
• System integration
• Scene analysis and representation
Dr. Werner Hemmert, CPR ST
2003-12-02 Page 18
Audio-Visual Speech Recognition
Dr. Werner Hemmert, CPR ST
2003-12-02 Page 19
Audio-Visual Speech Recognition
Tracking of lip motion with sub-pixel precision
Dr. Werner Hemmert, CPR ST
2003-12-02 Page 20
Audio-Visual Speech Recognition
Tracking of lip motion with sub-pixel precision
“two - one - seven - three - five - nine - eight - zero - four - six”
Variation of
mouth width
Hidden-
Markov
mouth height
10 pixels
Speech
nose to chin
distance
Dr. Werner Hemmert, CPR ST
2003-12-02 Page 21
0
2
4
6
time (s)
8
10
12
Recognizer
Multi-modal: Pointing, gaze, gestures, mimics,…
Dr. Werner Hemmert, CPR ST
2003-12-02 Page 22
Dr. Axel Steinhage, Infineon Technologies AG
Neuro-IT Roadmap: Successful in the Physical World
Robust perception
Image processing
Speech recognition
Audio-visual speech recognition
Multimodal human machine interaction
• System integration
• Scene analysis and representation
Dr. Werner Hemmert, CPR ST
2003-12-02 Page 23
Man-Machine-Interaction based on
natural communication channels
Dr. Axel Steinhage,
Infineon Technologies
Items
presented
by VPA
Virtual Personal
Assistant (VPA)
Cheap sensors
(Webcam,
Microphone)
Dr. Werner Hemmert, CPR ST
2003-12-02 Page 24
Natural channels
speech, lip-motion,
gestures ...
Interactive
comunication
between user and
VPA
Man-Machine-Interaction based on
natural communication channels
Items
presented
by VPA
Advanced Videophone
Dr. Axel Steinhage,
Infineon Technologies
Human expert via
Advanced
Videophone (HHI)
Virtual Personal
Assistant (VPA)
Cheap sensors
(Webcam,
Microphone)
Dr. Werner Hemmert, CPR ST
2003-12-02 Page 25
Natural channels
speech, lip-motion,
gestures ...
Interactive
comunication
between user and
VPA
What do we earn from Neuro-IT ?
• Sensitive Sensors
• Robust perception
Robust processing
“Tools for Neuroscience”
• Speech recognition
“Successful in the Physical World”
• Image processing
• Scene analysis and representation
World knowledge
“Constructed brain”
• Intelligent human-machine interaction
• Natural feedback
• Intelligent virtual person
“Conscious Machines”
• Self learning Software “Factor 10”
Dr. Werner Hemmert, CPR ST
2003-12-02 Page 26
• Massively parallel processing hardware
Digital and/or analog
neuronal networks
Neuro-IT Roadmap: Successful in the Physical World
Werner Hemmert
Infineon technologies AG
CPR-ST
Prof. Dr. Dr. h.c. H.-P. Zenner
Prof. Dr. A.W. Gummer
Prof. Dr. D.M. Freeman
Dr. M. Mermelstein, B. Tsai
MIT Micromechanics Group
U. Dürig, M. Despont, G. Genolet,
U. Drechsler, P. Vettiger, G. Binning
Dr. Werner Hemmert, CPR ST
2003-12-02 Page 27
Explore the Future Corporate Research
Prof. Dr. U. Ramacher
J.-P. de la Cruz-Guiterrez, M. Holmberg
Dr. A. Steinhage, Dr. A. Techmer