UTI Ultrasound Tongue Imaging

download report

Transcript UTI Ultrasound Tongue Imaging

LOT summer school
Ultrasound, phonetics, phonology:
Articulation for Beginners!
James M Scobbie
CASL Research Centre
With special thanks to collaborators
Jane Stuart-Smith & Eleanor Lawson
Joanne Cleland & Zoe Roxburgh
Natasha Zharkova, Laura Black, Steve Cowen
Reenu Punnoose, Koen Sebreghts
Sonja Schaeffler & Ineke Mennen
Conny Heyde
Alan Wrench (aka Articulate Instruments Ltd) for AAA software and UTI hardware
Various funding – thank you to ESRC, EPSRC, QMU
June 2013
Introduction to articulation
Brief overview of techniques
Ultrasound tongue imaging
Technical issues and the nitty gritty of data
Maybe a linguistic illustration
– Malayalam liquids
Why study articulation?
• It underlies acoustic and visual elements of
speech, but is only a means to an end
• It tells you about what speakers actually do
• It might be what the speaker intends to control
• Some bits of speech are silent
• It is interesting in its own right because it is a
complex multichannel linguistic phenomenon
• Applications – clinical, military, HCI, L2
• Comparison with sign language and gesture
• Provides evidence for phonological patterns
Why study articulation?
• Silent articulations
– Pre-speech, post-speech
– Speech errors
– Voiceless stops
– Listening, turn taking
• Covert contrasts
– In acquisition therapy, L2 learning, sociolinguistics
• Covert errors
– In acquisition, therapy, L2 learning
• Articulation / acoustics relationship in
segmental and prosodic speech
Easy topics for research
Techniques for speech analysis?
• Flesh-point tracking
– EMA Electromagnetic Articulography
– Motion-capture
• Constriction tracking
– EPG Electropalatograph
– EGG / Largyngograph & transglottal illumination
• Parameter tracking / indirect analysis
– Airflow
– Intraoral pressure sensing
– EMG Electromyography or muscular measures
– Acoustic analysis
• formants, pitch, voice quality, constriction types, VOT etc.
Quantitative articulatory approaches
• Video (including basic photos/still frames)
– Regular video (25fps PAL or 30fps NTSC), often
underlyingly a higher image / refresh rate (de-interlaceable)
– From cheap camcorders to endoscopy for internal images
• X-ray stills and X-ray cinematography
• MRI Magnetic Resonance Imaging & CT scanning
– Excellent resolution of superficial and deep features in 3D
for static images
– Bigger voxels, more grainy, more processing at faster frame
rates, great prospects in next few years
• UTI Ultrasound Tongue Imaging
– Regular video outputs or
– Digital (“high speed”) cineloop outputs
• EMA: gluing and sitting in the cube: 2h+ data
• UTI: probe fitting and the headset: “30m” data
– Short sessions, outputs are image sequences,
needs synchronisation, captures root
What hi-tech speakers go though
• Video playback in powerpoint / media players don’t
really convey the spatio-temoral nature of the data
Vid UTI: ECB08 spontaneous dialogue
• What do you do when you “say hello”…?
• Description…?
What about something easier?
• Same speaker?
• What do you do when you “say hello”…?
Video can tell you a lot
• Easy-to-get images are only 2 dimensional
– The head and vocal tract are 3D objects
– Which 2 dimensional plane do you want to study?
• Speaker & camera move relative to each other
– False motion of articulators within the plane
– Towards or away from the camera, changing scale
– And rotations mean a different plane is shown
• Not many frames per second
– Potential for smearing in time
– Missing key events completely
– Weak and/or variable synch with acoustics
Drawbacks are not unique
• To get data in more than one plane, let alone
enough to make a 3D image that moves in
… means sacrifices
– Lower spatial resolution
– Lower temporal resolution
3D / 4D?
Ultrasound Tongue Imaging
• Ultrasound as a tongue imaging technique
• Relatively cheap, non-invasive and accessible
– Fieldwork
– Clinical diagnostics
– Child language acquisition
– Standard laboratory phonetics & phonology
• Real time visual biofeedback
– Phonetics and linguistics teaching
– Clinical intervention
– L2 teaching and personal training
UTI applications
• Quick, portable, cheap, live/realtime, “comfyish”
• Synchronisation with audio, probe movement
• Applications
– Clinic
– Teaching
– Piloting
– Outreach
– Fieldwork
– Discourse
– Infants
– pT Research!
hand held & live video-mode
• Articulate Assistant Advanced (AAA)
• ~120fps hs-UTI: raw probe echo-location data
is stored and re-imaged on the fly
– Up to ~400fps available
• 135° Field of View
• ~60fps de-interlaced lip camera
stored as uncompressed bitmap
QMU AAA multichannel lab set-up
MC suburb
and the AAA multichannel system
• Data collection and analysis in a fast single
dedicated software environment
– Ultrasonix high speed UTI (no post-processing)
– Various video UTI or camcorder systems
– Same annotation & display software for EPG, EMA
• Custom-made multichannel synchronisation
– Video via “synch-brightup” clapperboard on AD of
video images, with built in batch-processing
• De-interlacing from ~30 to ~60-fps
• Offset of clapper-frame to adjust for UTI creation (~20ms)
• Semi-automatic edge-detection
– Smoothing & confidence-rating over 42 points
• Ultrasound gives rise to
– Artefacts from parallel tongue & echo pulse beams
– Missing data
• Between scan lines, or beyond the scan area
• Behind bones
• Above a sublingual cavity (aka losing the tip)
– Grainy data or poor resolution
• Older speakers, dry mouth, beards, etc.
• Tongue surface when it’s far from the probe
• When tongue is parallel to the scanlines
• Problems with stabilisation & synchronisation
– Technical solutions can lead to speaker fatigue
Some problems
• We only have mid-sagittal tongue curves
– Not passive articulators
– Not all the tongue surface
– Not all the internal tongue tissue
– Not lips
• But unlike EMA
– We are not limited to 3 or 4 anterior points
• And unlike MRI
– UTI is cheap, non-invasive, portable and quick
– For small datasets analysis is quick… we can
collect & trace 12 tokens of 5 vowels in half a day
With UTI…
Real time visual
• Have a go…
• Front/back vowels
• Dutch & English /r/
• Dutch & English /s/
and English “sh”
• Swallowing
Annotation and analysis
• Basic overview, using
some good data
• Annotation and
• Drawing some splines