Document 7446416
Download
Report
Transcript Document 7446416
ViSiCAST 2002 Technical Audit
4 October 2002, Brussels
Michele Wakefield - Project Manager, ITC
The ViSiCAST Project
Virtual
Signing
Capture
Animation
Storage and
Transmission
Aims of ViSiCAST Project
“…support improved access by deaf
citizens to information and services in
sign language”
by successfully developing signing systems for
broadcast,
user
friendly methods to capture & generate signs
machine
...
WWW & ‘over the counter’ type applications
readable system to describe gestures
preferred medium is sign language
ViSiCAST Consortium
Hamburg
University
University of East Anglia
Televirtual
Institut National des
Télécommunications
Independent
Television
Commission
Institut für
Rundfunktechnik
Instituut
voor Doven
The Post Office
Royal National
Institute for
Deaf People
Project Dimensions
Duration
Start
: January 2000
Finish : December 2002
36 months
Total Costs
3770kECU total
2876kECU funding from EC
ViSiCAST Project Highlights
Signing transmissions demonstrated at IBC 2002
MPEG-4 compliant INT-IRT demonstrator to deliver an open signing
service for broadcast DTV
BBC demonstrator to deliver closed DTV signing service
Translate simple sentences in real time to sign animation
WWW Weather-forecaster launched in the Netherlands
Interactive sign language learning tool
2nd trial of TESSA system now nationwide and RNID re-
promoting ViSiCAST
after success of pilot at Science Museum, London
encouraging national media coverage
ViSiCAST Project Structure
Animation
Technology
User
Application
Exploitation
&Dissemination
Linguistics
Community
Internet
Evaluation
Broadcast
Exploitation
Presentations by Core Streams
Technology: Animation & Linguistics
WP4 Animation
WP5 Linguistics
User: Applications
WP2 Sign Tutor
WP1 Broadcast
WP2 WWW
WP3 Face to Face
WP6 Usability
Exploitation & Dissemination
Presentation by Streams Animation
WP 4 Animation
Increased
realism in sign generation
Enhanced signing experience
WP5 Sign Language Linguistics
Use of natural sign language
Synthesis of sign language gestures
Animation Work: Objectives
WP4:
Develop
Hi-Resolution Avatars + related capture,
and animation
To enable and support application development
in WPs 1-2-3 using WP4 (& WP5) Product
To further develop, compare and integrate both
proprietary and standard solutions, where
appropriate, in networked environments
Technology: WP4, Animation
At start of Year:
Visia 2
Running
in Mask 1
Using Motion Capture
Data only
Reasonable
animation, expression
etc.
Technology: WP4, Animation
Visia 2 in MPEG-4
Mesh partitioned into
anatomical segments
MPEG-4 compliant
authoring tool
Animation editing tool
Server-client tool for TX of
animation parameters
MPEG-4 SNHC player
<25fps
Embedded within an
MPEG-4 set-top box
Technology: WP4, Animation
Visia 3
Updated Virtual Human
Higher resolution &
polygon count, more
realistic photographic
textures
Improved articulation
Mesh distortion applied to
garments
Facial expression via
skeleton manipulation &
morphs
Speech Enabled
Technology: WP4, Animation
Visia 3
New host software - Mask
TNG
Writing new Active X
Controls
Superior functionality,
lighting and Camera FX,
image quality, frame rate,
flexibility etc.
>75 FPS
Technology: WP4, Animation
Visia 3
Running in Mask
TNG Graph
Technology: WP4, Animation
Facial Morphs
Created in Maya, exported to
Mask TNG
Based on Sign Language
expressions (BSL Dictionary)
Inter-operable
Variable weighting (<100%+)
May be used with Mo-Cap
data or for synthetic sign
Technology: WP4, Animation
Facial Animation -
Experimental Work
Tracking
of Active
Shape Models
Tracking of Active
Appearance Models
Technology: WP4, Animation
Facial Animation -
Experimental Work
Vision-based motion
capture of facial
expressions using
MPEG-4 compliant
templates.
WP4: Synthetic Animation Introduction
Task:
Make
avatar do signing synthetically
as
specified by ViSiCAST’s Signing Gesture Markup
Language - SiGML
Motive:
Synthetic animation is more flexible than animation via
motion-capture - “just write some more SiGML”
Support Natural-Language-to-Animation strategy of WP4-5
In broadcasting applications: put synthetic player on
receiver and transmit SiGML - very low bandwidth
WP4: Synthetic Animation Context
Televirtual Avatar is a deformable textured Mesh
Mesh shape and position are determined by
configuration of underlying Skeleton
skeleton configuration: a.k.a. “Bone-Set”
To animate avatar: need to generate stream of
Bone-Sets - one per frame of animation
i.e. BAF data stream - BAF = “Bones Animation Format”
Data intensive: 4Kb per bone-set
WP4: Synthetic Animation Technical Approach
SiGML specifies gestures through:
Postures:
hand shape
hand orientation - palm and extended finger direction
position of hand(s) in signing space
Motions - straight-line, circular, zig-zag etc.
Synthetic Animation Engine:
specifies hand bone configuration for given posture
configures arm/shoulder bones using Inverse Kinematics
implements transition from one posture to next using nonlinear interpolation - often via control system modelling
WP4: Synthetic Animation Progress (i)
Initial Prototype (D4-2) delivered 2001-12
Supported
most of manual SiGML
Implemented in Perl (interpreted scripting language)
BAF/VRML output to file - and then to avatar
Relatively slow - often < 15 fps
Perl module packaged as ActiveX control
relatively unwieldy architecture
Enhancements for 2002-02 (M5-11)
BAF
data stream cached in memory-fed directly to avatar
Front-end(for WP5): HamNoSys input server, with built-in
HamNoSys-to-SiGML translation
WP4: Synthetic Animation Progress (ii)
HamNoSys-to-Signing (Fast) 2002-06
Synthetic Animation Engine re-implemented in C++
50 times faster - generates approx. 1000 fps, supporting real-
time streamed input (e.g. Broadcast, WWW)
More flexible framework - basis for improved authenticity
Modular system architecture - supports flexible application
development, scripting in WWW pages, etc.
Upgrade to Mask2 2002-09
Interface to new primitive Mask2 ActiveX control
allows better control of animation frame scheduling
BAF replaced by VBM (ViSiCAST Bones and Morphs) provides framework for support of non-manual SiGML
Presentation by Streams Linguistics
WP 4 Animation
Increased
realism in sign generation
Enhanced signing experience
WP5 Sign Language Linguistics
Use of natural sign language
Synthesis of sign language gestures
WP 5: Language Technology
Goal within the project:
To
provide semi-automatic translation from
English into BSL, DGS, NGT
Can also be used to assist the user in
monolingual language input
No
writing system for sign languages established
Presentation by Streams
Animation and Linguistics
User Applications
Exploitation and Dissemination
Presentation by Streams Sign Tutor
WP2 Sign Tutor
WP1 Television
Closed
signing for Broadcast DTT
WP2 Internet
Information
and Education for Sign Language Learners
WP3 Face to Face
High Street Post Office Counter Services
WP6 Comparison of virtual signing
with
video-recorded Human Signing
Presentation by Streams Television
WP2 Sign Tutor
WP1 Television
Closed
signing for Broadcast DTT
Enhanced signing experience
Regulation and Standards
WP2 Internet
Information
and Education for Sign Language Learners
WP3 Face to Face
WP6 Comparison of virtual signing
Virtual Humans on TV: The Advantages
Low transmission rate < 25 kbit/s
Compatibility with signing on other media and sign
languages
Precise, sharp representation of signer
Open display options
Compliance with international standards: MPEG, DVB
Future-proof:
cost saving
allows vast no. of signed programmes
unified framework from video-based to VH signing
Broadcast VH Signing:
Achievements
Integrated TX system for broadcast to STBs
Implementing virtual human s/w in STB
MPEG-2 delivery layer for maximum compliance:
with existing hardware
with MPEG & DVB standards
with proprietary formats
MPEG-4 Audio-Video codec and player
MPEG-4 compliant virtual human
MPEG-4 SNHC virtual human codec and player
MPEG-4 based closed signing service demonstrated
at IBC 2002
Broadcast VH Signing:
Functional architecture
normative
proprietary
MPEG-2
AV
encoder
MPEG-2
AV
decoder
MPEG-4
video
encoder
MPEG-4
video
decoder
MPEG-4
SNHC
encoder
MUX
dePacket
Packet
deMUX
MPEG-2
TS
BAF
encoder
Encoder
System
Delivery
Compositor
MPEG-4
SNHC
decoder
Proprietary
Multimedia
player
BAF
decoder
System
Decoder
MPEG-4
multimedia
player
Compositor
Broadcast VH Signing:
System layer implementation
UDP/TCP
packetiser
IRT-DSP
MPEG
encoder
IP
filter
DVB
RF
MPEG-2
receiver
modulator TS
card
Encoder
System
Delivery
System
Decoder
Compositor
Broadcast VH Signing:
Perspectives
Advanced
TX system for broadcast to
MHP compliant STBs
Open, MPEG & DVB compliant architecture
Improved synchronisation layer
Integrating a compositing layer
Implementing an enriched MPEG-4 multimedia
authoring tool
Integrating SiGML stream
Demonstration
Presentation by Streams WWW - Web pages with signing Field trials
WP2 Sign Tutor
WP1 Television
Closed
signing for Broadcast DTT
WP2 Internet
Information
and Education for sign language learners
Web-pages with signing
WP3 Face to Face
High Street Post Office Counter Services
WP6 Comparison of virtual signing
Weather Forecast Application
Internet
1rst DEMO
content
provider
forecast
creation tool
2nd DEMO
‘play list’
user
web-browser
+ plug-in
weather
signs
avatar
Demo
The field trials with Deaf users
Hosting at site of Dutch Deaf organisation
Dovenschap: www.dovenschap.org
Running from end-June until end-October
Deaf users can join the field trial by filling in a
form on the website
CD-rom with necessary software sent to users
Field Trial Promoted
70
e-mails to webmasters of Deaf clubs, Deaf
schools, Deaf organisations and private sites of
Deaf persons
promotion on Teletext (T.V.)
on informative websites for Deaf people
visit at meeting of national Deaf organisation
with 12 member organisations
article in magazine for sign language
interpreters
30 CD-roms sent to Deaf clubs and schools
Trial Feedback
Helpdesk, contacted by e-mail
Discussion page on website
Evaluation form: software and installation,
included with receiving software
Evaluation form: avatar and sign language, will be
sent end of October 2002
Present Situation
Field trial still running
News slowly spreading
Positive reactions
Results at the end of November
Presentation by Streams –
Face to Face
WP2 Sign Tutor
WP1 Television
Closed
signing for Broadcast DTT
WP2 Internet
WP3 Face to Face
High Street Post Office Counter Services
Close involvement with RNID
WP6 Comparison of virtual signing
with
video-recorded Human Signing
WP3 Overview
Evaluation – October 2001
New TESSA system – Mar 2002
Post Office Trial – May 2002 – Present
Sign Recognition – April 2002 –
Present
Evaluation – October 2001
Evaluation conducted at PO concept
store using TESSA V3.
10 Deaf People and 5 Counter Clerks
participated over 10 days.
Mirror of previous evaluation + Some
comparative tests of virtual signing
with a video recorded human signer
(full details in WP6 presentation)
Evaluation – Observations
Clerks complained about the speed of
transactions
Caused by :
Toggle
switch for recogniser
Mis-recognitions caused by large vocabulary
Poor mapping from recognised speech to
phrases
Cumbersome graphical interface
Tessa V4 – Recognition
System
‘Bag of words’ language model.
Hello
Goodbye
Going
Where
First
Second
Class
…
– Only words relevant
to post office phrases
recognised
– Many fewer insertion
errors
– More resilient to
external noise
0
0
1
0
.
.
.
0
0
1
0
0
0
1
.
.
.
0
0
0
2
0
0
1
.
.
.
0
0
0
Phrase N
Phrase 3
A
About
Access
Account
.
.
.
You
you’ve
Your
Phrase 1
Phrase 2
TESSA V4 – Phrase Mapping
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
0
0
0
.
.
.
0
1
0
Phrase mapping
system derived from
work on Automatic
Call Routing
Represent each of
the signed phrases
and the test phrase
as vectors in a cooccurrence matrix
TESSA V4 – Phrase Mapping
Weight the entry W(i,j) such that :
W (i, j ) ® (1 + log(W (i, j )) *
Nj
-1
å Pr( p j | wi ) log( Pr( p j | wi )
log( N j ) j = 1
• Calculate distance between vectors
representing each canonical phrase
and input phrase.
More details in
S. Cox. “Speech and Language Processing for a Constrained Speech Translation System”. In Proc. Int.
Conf. On Spoken Language Processing. October 2002
M.Lincoln and S.Cox. “A Comparison of Language Processing Techniques for a Constrained Speech
Translation System” (Submitted ICASSP 2003)
TESSA V4 - Mapping
Evaluation
Subset of 155 phrases.
5 Talkers, each asked to
write
down another way of expressing the
phrase
record speaker saying this phrase
Recognise speech (NB No Adaptation)
75.1% Correct ; 49.8% Accurate
Test phrase mapping on both text and
recognised speech
TESSA V4 – Mapping Evaluation
TESSA V4 – User Interface
• Push to talk
(automatic end of
speech detection)
• Larger Buttons
• Common Phrases
which don’t need
to be spoken
• Continually
updated list of top
5 most used signs
Post Office Trial - Set-up
Tessa V4 used
5 Post Offices
London, Bristol, Derby, Liverpool, Wolverhampton
Known Deaf Communities In Each Area
3 Months Duration
Equipment Given Health Safety
Approval
Trained 19 Counter Clerks
Provided Help Desk Support
Post Office Trial - Survey
Independent Survey Customers by
RNID
Independent Survey of Counter Clerks
All Users Given RNID Questionnaire
All Counters Clerks Interviewed
Post Office Trial - Publicity
BBC See Hear – early Oct
Channel 4 – Documentary on BSL
Disability Times – 1 October 2002
BBC Worldwide – 24 August 2002
ITV London Tonight – 21 August 2002
Liverpool Echo – 1 August 2002
Camden Chronicle – 1 August 2002
Wolverhampton Chronicle – 25 July 2002
Post Office Trial - Publicity
Bristol Evening Post – 22 July
Liverpool Echo – 19 July
Derby Evening Telegraph – 18 July
Wolverhampton Express and Star – 17
July
Sign Language recognition
Preliminary investigation
6
Gestures, 10 training and 5 testing examples
Single user
Motion captured data
HMM recognition system
Initial results – 95% accuracy
Sign Language Recognition
Comparison of recognition using motion
captured data and video.
Collaboration with EU ‘WISDOM’ project.
Currently Recording and editing multiuser
database.
10 signs, 10 training and 5 testing examples
5 users
Motion captured and video
RNID to make independent evaluation of
recognition accuracy.
Presentation by Streams – Usability of
Virtual Signing
WP2 Sign Tutor
WP1 Television
Closed
signing for Broadcast DTT
WP2 Internet
Information
and Education for Deaf People
WP3 Face to Face
High Street Post Office Counter Services
WP6 Comparison of virtual signing
with
video-recorded Human Signing
Methods
60 phrases from the PO TESSA system
signed by human interpreter on video
120 phrases signed by the virtual human
10 profoundly deaf people whose first
language is BSL
Outcome measures:
Accuracy
of identification
Subjective ratings for each phrase
Overall subjective ratings
Accuracy of identification
Accuracy (%)
100
80
60
40
20
0
Whole phrases
Sign units
Virtual
Human
Subjective Ratings
Ease of identification
80
100
Virtual
Human
Ratings (%)
Ratings (%)
100
Acceptability
60
40
20
80
Virtual
Human
60
40
20
0
0
1
Very difficult
2
3
4
5
1
Very easy
Low
2
3
High
Rating / Accuracy (%)
Visual Analogue Scales
Virtual
Human
Phrases
Sign units
100
80
60
40
20
0
Clarity
Acceptability
Usability Conclusions
Higher accuracy of identification for human than
virtual signed phrases (20%)
Some improvements in intelligibility of virtual
signing required
Non-ceiling benchmark of accuracy determined
60% virtual signed phrases judged as good as
human signed phrases
Greater scope for improvements in terms of
subjective views of virtual signing
Impressive results for virtual signing
Exploitation and
Dissemination Highlights
TESSA IT Awards & success in the community
WWW Weather Forecaster launched in 2 European
Sign Languages & encouraging feedback
IvD & RNID host in UK and the Netherlands
Close Involvement of Deaf People
RNID promoting ViSiCAST nationally
BBC Collaboration for closed signing solution for
broadcasting DTV for bandwidth efficiency
Increasing amount of in-vision signing disliked by hearing people
Impacts on DTT multiplexes where bit-rate is already at a premium
Exploitation & Dissemination
UK Government 10 year target - 5%programmes
on DTT services to be signed
Today, services use ‘open signing’
Closed signing offers freedom
Hearing viewers can find distracting
Seldom transmitted at peak viewing times
for viewers - to turn on and off
scheduling freedom for broadcasters
but needs extra transmission feed
ViSiCAST uses ‘virtual human’
reducing bandwidth needs by factor of ten compared to video
Closed Signing – Why an avatar-based
solution ?
MPEG2 coding (0.5-1Mbit/s)
only
1 service signed per multiplex if at all
MPEG4 coding (<350Kbit/s)
no
more that 2 services signed per multiplex
more efficient compression, and ability to code nonrectangular objects
Animated Avatars (<100Kbit/s)
may
be possible to sign all services in a multiplex
need new techniques to capture motion of real signers
Closed Signing Requirements for the
Broadcaster
Be compatible with existing studio,
distribution & monitoring infrastructures
maintain freedom to schedule as needed
accommodate live signing and reactive
scheduling
allow for regional content insertion and
time-shifting &
cope with the variety of picture display
formats
Avatar Signing developments
for broadcasting
Motion capture needs to be
efficient and signer-independent
enabling signing of live and reactive
broadcast material
best suited for offline broadcasting today
Facial motion capture needs
refinements
Increasing realism make avatars more
acceptable
Signing Capture - Studio Implementation
Tape
Original
Programme
SDI
Signing Data
SDI inserter
Monitor
Ethernet
SDI
Signer
Camera
Motion
capture
Coding /
Compression
SDI with
embedded
Signing Data
Video Server
Studio and distribution issues
Provision of television programme material with
associated signing
Development of equipment for conveying signing data
within studio infrastructure
We have developed hardware to add signing or motion capture
data to a SDI video stream.
The main program video/audio, and the corresponding data can
then be routed via standard studio infrastructure.
The combined A/V and signing data can also be stored on server
or video tape
Development of DVB inserter agnostic of signing signal coding
method
Development of end-to-end DT demonstrator