Document 7446416

Download Report

Transcript Document 7446416

ViSiCAST 2002 Technical Audit
4 October 2002, Brussels
Michele Wakefield - Project Manager, ITC
The ViSiCAST Project
Virtual
Signing
Capture
Animation
Storage and
Transmission
Aims of ViSiCAST Project
“…support improved access by deaf
citizens to information and services in
sign language”
 by successfully developing signing systems for
 broadcast,
 user
friendly methods to capture & generate signs
 machine
 ...
WWW & ‘over the counter’ type applications
readable system to describe gestures
preferred medium is sign language
ViSiCAST Consortium
Hamburg
University
University of East Anglia
Televirtual
Institut National des
Télécommunications
Independent
Television
Commission
Institut für
Rundfunktechnik
Instituut
voor Doven
The Post Office
Royal National
Institute for
Deaf People
Project Dimensions
 Duration



Start
: January 2000
Finish : December 2002
36 months
 Total Costs


3770kECU total
2876kECU funding from EC
ViSiCAST Project Highlights
 Signing transmissions demonstrated at IBC 2002

MPEG-4 compliant INT-IRT demonstrator to deliver an open signing
service for broadcast DTV
 BBC demonstrator to deliver closed DTV signing service
 Translate simple sentences in real time to sign animation
 WWW Weather-forecaster launched in the Netherlands
 Interactive sign language learning tool
 2nd trial of TESSA system now nationwide and RNID re-
promoting ViSiCAST

after success of pilot at Science Museum, London
 encouraging national media coverage
ViSiCAST Project Structure
Animation
Technology
User
Application
Exploitation
&Dissemination
Linguistics
Community
Internet
Evaluation
Broadcast
Exploitation
Presentations by Core Streams
 Technology: Animation & Linguistics


WP4 Animation
WP5 Linguistics
 User: Applications

WP2 Sign Tutor

WP1 Broadcast

WP2 WWW

WP3 Face to Face

WP6 Usability
 Exploitation & Dissemination
Presentation by Streams Animation
 WP 4 Animation
 Increased
realism in sign generation
 Enhanced signing experience
 WP5 Sign Language Linguistics

Use of natural sign language
 Synthesis of sign language gestures
Animation Work: Objectives
 WP4:
 Develop
Hi-Resolution Avatars + related capture,
and animation
 To enable and support application development
in WPs 1-2-3 using WP4 (& WP5) Product
 To further develop, compare and integrate both
proprietary and standard solutions, where
appropriate, in networked environments
Technology: WP4, Animation
 At start of Year:
 Visia 2
 Running
in Mask 1
 Using Motion Capture
Data only
 Reasonable
animation, expression
etc.
Technology: WP4, Animation
 Visia 2 in MPEG-4






Mesh partitioned into
anatomical segments
MPEG-4 compliant
authoring tool
Animation editing tool
Server-client tool for TX of
animation parameters
MPEG-4 SNHC player
<25fps
Embedded within an
MPEG-4 set-top box
Technology: WP4, Animation
 Visia 3






Updated Virtual Human
Higher resolution &
polygon count, more
realistic photographic
textures
Improved articulation
Mesh distortion applied to
garments
Facial expression via
skeleton manipulation &
morphs
Speech Enabled
Technology: WP4, Animation
 Visia 3




New host software - Mask
TNG
Writing new Active X
Controls
Superior functionality,
lighting and Camera FX,
image quality, frame rate,
flexibility etc.
>75 FPS
Technology: WP4, Animation
 Visia 3
 Running in Mask
TNG Graph
Technology: WP4, Animation
 Facial Morphs
 Created in Maya, exported to




Mask TNG
Based on Sign Language
expressions (BSL Dictionary)
Inter-operable
Variable weighting (<100%+)
May be used with Mo-Cap
data or for synthetic sign
Technology: WP4, Animation
 Facial Animation -
Experimental Work
 Tracking
of Active
Shape Models
 Tracking of Active
Appearance Models
Technology: WP4, Animation
 Facial Animation -
Experimental Work
 Vision-based motion
capture of facial
expressions using
MPEG-4 compliant
templates.
WP4: Synthetic Animation Introduction
 Task:
 Make
avatar do signing synthetically
 as
specified by ViSiCAST’s Signing Gesture Markup
Language - SiGML
 Motive:

Synthetic animation is more flexible than animation via
motion-capture - “just write some more SiGML”

Support Natural-Language-to-Animation strategy of WP4-5

In broadcasting applications: put synthetic player on
receiver and transmit SiGML - very low bandwidth
WP4: Synthetic Animation Context
 Televirtual Avatar is a deformable textured Mesh
 Mesh shape and position are determined by
configuration of underlying Skeleton

skeleton configuration: a.k.a. “Bone-Set”
 To animate avatar: need to generate stream of
Bone-Sets - one per frame of animation

i.e. BAF data stream - BAF = “Bones Animation Format”

Data intensive: 4Kb per bone-set
WP4: Synthetic Animation Technical Approach
 SiGML specifies gestures through:

Postures:
 hand shape
 hand orientation - palm and extended finger direction
 position of hand(s) in signing space

Motions - straight-line, circular, zig-zag etc.
 Synthetic Animation Engine:

specifies hand bone configuration for given posture

configures arm/shoulder bones using Inverse Kinematics

implements transition from one posture to next using nonlinear interpolation - often via control system modelling
WP4: Synthetic Animation Progress (i)
 Initial Prototype (D4-2) delivered 2001-12
 Supported
most of manual SiGML
 Implemented in Perl (interpreted scripting language)
 BAF/VRML output to file - and then to avatar
 Relatively slow - often < 15 fps
 Perl module packaged as ActiveX control
 relatively unwieldy architecture
 Enhancements for 2002-02 (M5-11)
 BAF
data stream cached in memory-fed directly to avatar
 Front-end(for WP5): HamNoSys input server, with built-in
HamNoSys-to-SiGML translation
WP4: Synthetic Animation Progress (ii)
 HamNoSys-to-Signing (Fast) 2002-06

Synthetic Animation Engine re-implemented in C++
 50 times faster - generates approx. 1000 fps, supporting real-
time streamed input (e.g. Broadcast, WWW)

More flexible framework - basis for improved authenticity
 Modular system architecture - supports flexible application
development, scripting in WWW pages, etc.
 Upgrade to Mask2 2002-09

Interface to new primitive Mask2 ActiveX control
 allows better control of animation frame scheduling

BAF replaced by VBM (ViSiCAST Bones and Morphs) provides framework for support of non-manual SiGML
Presentation by Streams Linguistics
 WP 4 Animation
 Increased
realism in sign generation
Enhanced signing experience
 WP5 Sign Language Linguistics

Use of natural sign language
 Synthesis of sign language gestures
WP 5: Language Technology
 Goal within the project:
 To
provide semi-automatic translation from
English into BSL, DGS, NGT
 Can also be used to assist the user in
monolingual language input
 No
writing system for sign languages established
Presentation by Streams
 Animation and Linguistics
 User Applications
 Exploitation and Dissemination
Presentation by Streams Sign Tutor
 WP2 Sign Tutor
 WP1 Television
 Closed
signing for Broadcast DTT
 WP2 Internet
 Information
and Education for Sign Language Learners
 WP3 Face to Face

High Street Post Office Counter Services
 WP6 Comparison of virtual signing
 with
video-recorded Human Signing
Presentation by Streams Television
 WP2 Sign Tutor
 WP1 Television
 Closed
signing for Broadcast DTT
 Enhanced signing experience
 Regulation and Standards
 WP2 Internet
 Information
and Education for Sign Language Learners
 WP3 Face to Face
 WP6 Comparison of virtual signing
Virtual Humans on TV: The Advantages
 Low transmission rate < 25 kbit/s
 Compatibility with signing on other media and sign
languages
 Precise, sharp representation of signer
 Open display options
 Compliance with international standards: MPEG, DVB
 Future-proof:

cost saving
 allows vast no. of signed programmes
 unified framework from video-based to VH signing
Broadcast VH Signing:
Achievements
Integrated TX system for broadcast to STBs
 Implementing virtual human s/w in STB
 MPEG-2 delivery layer for maximum compliance:

with existing hardware
 with MPEG & DVB standards
 with proprietary formats
MPEG-4 Audio-Video codec and player
 MPEG-4 compliant virtual human
 MPEG-4 SNHC virtual human codec and player
MPEG-4 based closed signing service demonstrated
at IBC 2002
Broadcast VH Signing:
Functional architecture
normative
proprietary
MPEG-2
AV
encoder
MPEG-2
AV
decoder
MPEG-4
video
encoder
MPEG-4
video
decoder
MPEG-4
SNHC
encoder
MUX
dePacket
Packet
deMUX
MPEG-2
TS
BAF
encoder
Encoder
System
Delivery
Compositor
MPEG-4
SNHC
decoder
Proprietary
Multimedia
player
BAF
decoder
System
Decoder
MPEG-4
multimedia
player
Compositor
Broadcast VH Signing:
System layer implementation
UDP/TCP
packetiser
IRT-DSP
MPEG
encoder
IP
filter
DVB
RF
MPEG-2
receiver
modulator TS
card
Encoder
System
Delivery
System
Decoder
Compositor
Broadcast VH Signing:
Perspectives
 Advanced
TX system for broadcast to
MHP compliant STBs
Open, MPEG & DVB compliant architecture
 Improved synchronisation layer
 Integrating a compositing layer
 Implementing an enriched MPEG-4 multimedia
authoring tool
 Integrating SiGML stream
Demonstration
Presentation by Streams WWW - Web pages with signing Field trials
 WP2 Sign Tutor
 WP1 Television
 Closed
signing for Broadcast DTT
 WP2 Internet
 Information
and Education for sign language learners
 Web-pages with signing
 WP3 Face to Face

High Street Post Office Counter Services
 WP6 Comparison of virtual signing
Weather Forecast Application
Internet
1rst DEMO
content
provider
forecast
creation tool
2nd DEMO
‘play list’
user
web-browser
+ plug-in
weather
signs
avatar
Demo
The field trials with Deaf users
 Hosting at site of Dutch Deaf organisation
Dovenschap: www.dovenschap.org
 Running from end-June until end-October
 Deaf users can join the field trial by filling in a
form on the website
 CD-rom with necessary software sent to users
Field Trial Promoted
 70
e-mails to webmasters of Deaf clubs, Deaf
schools, Deaf organisations and private sites of
Deaf persons
 promotion on Teletext (T.V.)
 on informative websites for Deaf people
 visit at meeting of national Deaf organisation
with 12 member organisations
 article in magazine for sign language
interpreters
 30 CD-roms sent to Deaf clubs and schools
Trial Feedback
 Helpdesk, contacted by e-mail
 Discussion page on website
 Evaluation form: software and installation,
included with receiving software
 Evaluation form: avatar and sign language, will be
sent end of October 2002
Present Situation
 Field trial still running
 News slowly spreading
 Positive reactions
 Results at the end of November
Presentation by Streams –
Face to Face
 WP2 Sign Tutor
 WP1 Television
 Closed
signing for Broadcast DTT
 WP2 Internet
 WP3 Face to Face

High Street Post Office Counter Services
 Close involvement with RNID
 WP6 Comparison of virtual signing
 with
video-recorded Human Signing
WP3 Overview
 Evaluation – October 2001
 New TESSA system – Mar 2002
 Post Office Trial – May 2002 – Present
 Sign Recognition – April 2002 –
Present
Evaluation – October 2001
 Evaluation conducted at PO concept
store using TESSA V3.
 10 Deaf People and 5 Counter Clerks
participated over 10 days.
 Mirror of previous evaluation + Some
comparative tests of virtual signing
with a video recorded human signer
(full details in WP6 presentation)
Evaluation – Observations
Clerks complained about the speed of
transactions
Caused by :
 Toggle
switch for recogniser
 Mis-recognitions caused by large vocabulary
 Poor mapping from recognised speech to
phrases
 Cumbersome graphical interface
Tessa V4 – Recognition
System
 ‘Bag of words’ language model.
Hello
Goodbye
Going
Where
First
Second
Class
…
– Only words relevant
to post office phrases
recognised
– Many fewer insertion
errors
– More resilient to
external noise
0
0
1
0
.
.
.
0
0
1
0
0
0
1
.
.
.
0
0
0
2
0
0
1
.
.
.
0
0
0
Phrase N
Phrase 3
A
About
Access
Account
.
.
.
You
you’ve
Your
Phrase 1
Phrase 2
TESSA V4 – Phrase Mapping
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
0
0
0
.
.
.
0
1
0
 Phrase mapping
system derived from
work on Automatic
Call Routing
 Represent each of
the signed phrases
and the test phrase
as vectors in a cooccurrence matrix
TESSA V4 – Phrase Mapping
 Weight the entry W(i,j) such that :
W (i, j ) ® (1 + log(W (i, j )) *
Nj
-1
å Pr( p j | wi ) log( Pr( p j | wi )
log( N j ) j = 1
• Calculate distance between vectors
representing each canonical phrase
and input phrase.
More details in
S. Cox. “Speech and Language Processing for a Constrained Speech Translation System”. In Proc. Int.
Conf. On Spoken Language Processing. October 2002
M.Lincoln and S.Cox. “A Comparison of Language Processing Techniques for a Constrained Speech
Translation System” (Submitted ICASSP 2003)
TESSA V4 - Mapping
Evaluation
 Subset of 155 phrases.
 5 Talkers, each asked to
 write
down another way of expressing the
phrase
 record speaker saying this phrase
 Recognise speech (NB No Adaptation)

75.1% Correct ; 49.8% Accurate
 Test phrase mapping on both text and
recognised speech
TESSA V4 – Mapping Evaluation
TESSA V4 – User Interface
• Push to talk
(automatic end of
speech detection)
• Larger Buttons
• Common Phrases
which don’t need
to be spoken
• Continually
updated list of top
5 most used signs
Post Office Trial - Set-up
Tessa V4 used
5 Post Offices

London, Bristol, Derby, Liverpool, Wolverhampton
Known Deaf Communities In Each Area
3 Months Duration
Equipment Given Health Safety
Approval
Trained 19 Counter Clerks
Provided Help Desk Support
Post Office Trial - Survey
 Independent Survey Customers by
RNID
 Independent Survey of Counter Clerks
 All Users Given RNID Questionnaire
 All Counters Clerks Interviewed
Post Office Trial - Publicity
 BBC See Hear – early Oct
 Channel 4 – Documentary on BSL
 Disability Times – 1 October 2002
 BBC Worldwide – 24 August 2002
 ITV London Tonight – 21 August 2002
 Liverpool Echo – 1 August 2002
 Camden Chronicle – 1 August 2002
 Wolverhampton Chronicle – 25 July 2002
Post Office Trial - Publicity
 Bristol Evening Post – 22 July
 Liverpool Echo – 19 July
 Derby Evening Telegraph – 18 July
 Wolverhampton Express and Star – 17
July
Sign Language recognition
Preliminary investigation
6
Gestures, 10 training and 5 testing examples
 Single user
 Motion captured data
 HMM recognition system
Initial results – 95% accuracy
Sign Language Recognition
 Comparison of recognition using motion
captured data and video.
 Collaboration with EU ‘WISDOM’ project.
 Currently Recording and editing multiuser
database.



10 signs, 10 training and 5 testing examples
5 users
Motion captured and video
 RNID to make independent evaluation of
recognition accuracy.
Presentation by Streams – Usability of
Virtual Signing
 WP2 Sign Tutor
 WP1 Television
 Closed
signing for Broadcast DTT
 WP2 Internet
 Information
and Education for Deaf People
 WP3 Face to Face

High Street Post Office Counter Services
 WP6 Comparison of virtual signing
 with
video-recorded Human Signing
Methods
 60 phrases from the PO TESSA system
signed by human interpreter on video
 120 phrases signed by the virtual human
 10 profoundly deaf people whose first
language is BSL
 Outcome measures:
 Accuracy
of identification
 Subjective ratings for each phrase
 Overall subjective ratings
Accuracy of identification
Accuracy (%)
100
80
60
40
20
0
Whole phrases
Sign units
Virtual
Human
Subjective Ratings
Ease of identification
80
100
Virtual
Human
Ratings (%)
Ratings (%)
100
Acceptability
60
40
20
80
Virtual
Human
60
40
20
0
0
1
Very difficult
2
3
4
5
1
Very easy
Low
2
3
High
Rating / Accuracy (%)
Visual Analogue Scales
Virtual
Human
Phrases
Sign units
100
80
60
40
20
0
Clarity
Acceptability
Usability Conclusions
 Higher accuracy of identification for human than
virtual signed phrases (20%)
 Some improvements in intelligibility of virtual
signing required
 Non-ceiling benchmark of accuracy determined
 60% virtual signed phrases judged as good as
human signed phrases
 Greater scope for improvements in terms of
subjective views of virtual signing
 Impressive results for virtual signing
Exploitation and
Dissemination Highlights
 TESSA IT Awards & success in the community
 WWW Weather Forecaster launched in 2 European
Sign Languages & encouraging feedback
 IvD & RNID host in UK and the Netherlands
 Close Involvement of Deaf People
 RNID promoting ViSiCAST nationally
 BBC Collaboration for closed signing solution for
broadcasting DTV for bandwidth efficiency
 Increasing amount of in-vision signing disliked by hearing people
 Impacts on DTT multiplexes where bit-rate is already at a premium
Exploitation & Dissemination


UK Government 10 year target - 5%programmes
on DTT services to be signed
Today, services use ‘open signing’



Closed signing offers freedom




Hearing viewers can find distracting
Seldom transmitted at peak viewing times
for viewers - to turn on and off
scheduling freedom for broadcasters
but needs extra transmission feed
ViSiCAST uses ‘virtual human’

reducing bandwidth needs by factor of ten compared to video
Closed Signing – Why an avatar-based
solution ?

MPEG2 coding (0.5-1Mbit/s)
 only

1 service signed per multiplex if at all
MPEG4 coding (<350Kbit/s)
 no
more that 2 services signed per multiplex
 more efficient compression, and ability to code nonrectangular objects

Animated Avatars (<100Kbit/s)
 may
be possible to sign all services in a multiplex
 need new techniques to capture motion of real signers
Closed Signing Requirements for the
Broadcaster
 Be compatible with existing studio,
distribution & monitoring infrastructures
 maintain freedom to schedule as needed
 accommodate live signing and reactive
scheduling
 allow for regional content insertion and
time-shifting &
 cope with the variety of picture display
formats
Avatar Signing developments
for broadcasting
 Motion capture needs to be
efficient and signer-independent


enabling signing of live and reactive
broadcast material
best suited for offline broadcasting today
 Facial motion capture needs
refinements

Increasing realism make avatars more
acceptable
Signing Capture - Studio Implementation
Tape
Original
Programme
SDI
Signing Data
SDI inserter
Monitor
Ethernet
SDI
Signer
Camera
Motion
capture
Coding /
Compression
SDI with
embedded
Signing Data
Video Server
Studio and distribution issues
 Provision of television programme material with
associated signing
 Development of equipment for conveying signing data
within studio infrastructure
We have developed hardware to add signing or motion capture
data to a SDI video stream.
 The main program video/audio, and the corresponding data can
then be routed via standard studio infrastructure.
 The combined A/V and signing data can also be stored on server
or video tape
 Development of DVB inserter agnostic of signing signal coding
method
 Development of end-to-end DT demonstrator
