Language Technologies for the Mobile Internet Era

Download Report

Transcript Language Technologies for the Mobile Internet Era

Cyber Assist Consortium Second International Symposium
- Information Environment for Mobile and Ubiquitous Computing Era Tokyo, 25 March 2003
SmartKom: Modality Fusion for
a Mobile Companion based on
Semantic Web Technologies
Wolfgang Wahlster
German Research Center for Artificial Intelligence
DFKI GmbH
Stuhlsatzenhausweg 3
66123 Saarbruecken, Germany
phone: (+49 681) 302-5252/4162
fax: (+49 681) 302-5341
e-mail: [email protected]
WWW:http://www.dfki.de/~wahlster
Intelligent Interaction with Mobile Internet Services
Localization
Access to web content
and web services anywhere
and anytime
Access to edutainment
and infotainment services
Access to all messages
(voice, email, multimedia, MMS)
from any single device
Multimodal
UMTS Systems
Access to
corporate networks
and virtual private networks
from any device
Personalization
© W. Wahlster
SmartKom: A Highly Portable Multimodal Dialogue System
Application
Layer
Mobile:
Car and
Pedestrian
Navigation
MM
Dialogue
BackBone
Public:
Cinema,
Phone,
Fax,
Mail,
Biometrics
Home:
SmartKom-Mobile
Consumer Electronics
EPG
SmartKom-Public
SmartKom-Home/Office
© W. Wahlster
A Demonstration of SmartKom’s Multimodal
Interface for the Federal President of Germany Dr. Rau
© W. Wahlster
SmartKom`s SDDP Interaction Metaphor
SDDP = Situated Delegation-oriented Dialogue Paradigm
Anthropomorphic Interface = Dialogue Partner
Webservices
Service 1
User
specifies goal
delegates task
Personalized
Interaction Agent
cooperate
on problems
Service 2
asks questions
presents results
Service 3
See: Wahlster et al. 2001 , Eurospeech
© W. Wahlster
SmartKom‘s Use of Semantic Web Technology
Three Layers of
Annotations
Content
M3L
Personalized
Presentation
high
Structure
XML
medium
Layout
HTML
low
© W. Wahlster
SmartKom: Intuitive Multimodal Interaction
Project Budget: € 25.5 million, funded by BMBF (Dr. Reuse) and industry
Project Duration: 4 years (September 1999 – September 2003)
The SmartKom Consortium:
Main Contractor
Scientific Director
W. Wahlster
DFKI
Saarbrücken
Uinv. Of
Munich
MediaInterface
Dresden
Berkeley
Saarbrücken
European Media Lab
Heidelberg
Univ. of
Erlangen
Univ. of
Stuttgart
Munich
Aachen
Ulm
Stuttgart
© W. Wahlster
Outline of the Talk
1. The Markup Language Layer Model of SmartKom
2. Modality Fusion in SmartKom
3. The Role of the Semantic Web Language M3L
4. Providing Coherence in Multimodal Dialogs by
Ontology-based Overlay
5. Conclusions
© W. Wahlster
Mapping Web Content Onto a Variety of
Structures and Layouts
Personalization
M3L
XML1
HTML11
Content
XML2
HTML1m HTML21
Structure
XMLn
HTML2o HTML31
HTML3p Layout
From the “one-size fits-all“ approach of static webpages to the
“perfect personal fit“ approach of adaptive webpages
© W. Wahlster
The Markup Language Layer Model of SmartKom
eXtended Markup
Language Schema
eXtended Markup
Language
M3L
MultiModal Markup
Language
OIL
Ontology Inference
Layer
XMLS
XML
RDFS
Resource Description
Framework Schema
RDF
Resource Description
Framework
HTML
Hypertext Markup
Language
© W. Wahlster
SmartKom: Merging Various User Interface Paradigms
Graphical User
interfaces
Spoken
Dialogue
Gestural
Interaction
Multimodal
Interaction
Biometrics
Facial
Expressions
© W. Wahlster
Multimodal Input and Output in SmartKom
Fusion and Fission of Multiple Modalities
Input by
the User
Output by the
Presentation agent
Speech
+
+
Gesture
+
+
Facial
Expressions
+
+
© W. Wahlster
Symbolic and Subsymbolic Fusion of Multiple Modes
Facial
Expression
Recognition
Gesture
Recognition
Speech
Recognition
Prosody
Recognition
Lip
Reading
Subsymbolic
Fusion
Symbolic
Fusion
- Neuronal Networks
- Hidden Markov Models
- Graph Unification
- Bayesian Networks
Reference Resolution and Disambiguation
Modality-Free Semantic Representation
© W. Wahlster
Personalized Interaction with WebTVs
via SmartKom (DFKI with Sony, Philips, Siemens)
Example: Multimodal Access to Electronic Program Guides for TV
User: Switch on the TV.
Smartakus: Okay, the TV is on.
User: Which channels are presenting
the latest news right now?
Smartakus: CNN and NTV are
presenting news.
User: Please record this news
channel on a videotape.
Smartakus: Okay, the VCR is now
recording the selected program.
© W. Wahlster
Using Facial Expression Recognition for
Affective Personalization
Processing ironic or sarcastic comments
(1)
Smartakus: Here you see the CNN program for tonight.
(2)
User: That’s great.

(3) Smartakus: I’ll show you the program of another channel for tonight.
(2’) User: That’s great.

(3’) Smartakus: Which of these features do you want to see?
© W. Wahlster
The SmartKom Demonstrator System
Multimodal
Control of
TV-Set
Camera for
Gestural Input
Multimodal
Control of
VCR/DVD
Player
Microphone
Camera for
Facial Analysis
© W. Wahlster
Unification of Scored Hypothesis Graphs for
Modality Fusion in SmartKom
Scored
Hypotheses
about the User‘s
Emotional State
Clause and
Sentence
Boundaries
with Prosodic
Scores
Word Hypothesis
Graph with
Acoustic Scores
Gesture Hypothesis
Graph with Scores
of Potential
Reference Objects
Modality Fusion
Mutual Disambiguation
Reduction of Uncertainty
Intention Hypotheses Graph
Intention Recognizer
Selection of Most Likely Interpretation
© W. Wahlster
SmartKom‘s Computational Mechanisms for
Modality Fusion and Fission
Modality Fusion
Modality Fission
Planning
Unification
Overlay
Operations
Ontological
Inferences
Constraint
Propagation
M3L: Modality-Free Semantic Representation
© W. Wahlster
The Role of the Semantic Web Language M3L
l M3L (Multimodal Markup Language) defines the data exchange formats used
for communication between all modules of SmartKom
l M3L is partioned into 40 XML schema definitions covering SmartKom‘s
discourse domains
l The XML schema event.xsd
captures the semantic
representation of concepts
and processes in SmartKom‘s
multimodal dialogs
© W. Wahlster
OIL2XSD: Using XSLT Stylesheets to Convert an OIL
Ontology to an XML Schema
© W. Wahlster
Using Ontologies to Extract Information from the Web
Film.de-Movie
MyOnto-Movie
:o-title
:title
:description
:title
:description
:actors
MyOnto-Person
:director
Kinopolis.de-Movie
:name
:critics
:main actor
:name
:birthday
Mapping of Metadata
© W. Wahlster
M3L as a Meaning Representation Language for
the User‘s Input
I would like to send
an email to Koiti.
<domainObject>
<sendTelecommunicationProcess>
<sender>....................</sender>
<receiver>..............</receiver>
<document>..........</document>
<email>...........</email>
</sendTelecommunicationProcess>
</domainObject>
© W. Wahlster
Exploiting Ontological Knowledge to Understand
and Answer the User‘s Queries
Which movies with Schwarzenegger are shown on the Pro7 channel?
<domainObject>
<epg>
<broadcastDefault>
<avMedium>
<actors>
<beginTime>
<time>
<function>
<at>
2002-05-10T10:25:46
</at>
</function>
</beginTime>
<actor><name>Schwarzenegger/name></actor>
</actors>
</avMedium>
<channel><name>Pro7</name></channel>
</broadcastDefault>
</epg>
</domainObject>
© W. Wahlster
SmartKom’s Multimodal Dialogue Back-Bone
Analyzers
•
•
•
•
•
•
Speech
Gestures
Facial Expressions
Speech
Graphics
Generators
Gestures
Dialogue Manager
Modality
Fusion
Discourse
Modeling
Action
Planning
Modality
Fission
External
Services
Communication Blackboards
Data Flow
Context Dependencies
© W. Wahlster
Smartkom‘s Three-Tiered Discourse Model
Domain Layer
Discourse Layer
DomainObject2
DomainObject1
DO1
DO2
DO3
Modality Layer
VO1
LO2
...
...
GO1
DO10
DO11
DO12
LO3
LO4
LO5
DO9
...
LO
6
list

heidelberg
System: This [] is a list of films
showing in Heidelberg.
reserve ticket first
User: Please reserve a
ticket for the first one.
DO = Discourse Object, LO = Linguistic Object
GO = Gestural Object, VO = Visual Object
cf. M. Löckelt et. al. 2002, N. Pfleger 2002
© W. Wahlster
SmartKom’s Domain Model based on M3L
•
•
Used for communication in the back-bone
Frame-based ontology; representation as Typed Feature Structures in M3L (XML)
CinemaReservation theater: MovieTheater address: Address
seats: SeatStructure…
movie: Movie
name: String
director: Person firstName: String
lastName: String…
cast: PersonList
yearOfProduction: PositiveInteger…
reservationNumber: PositiveInteger
•
•
•
Application objects composed of subobjects
Slots: Feature paths meaningful for the dialogue (entities that can be talked about
/ referenced to); e.g. movie:director:lastName in a CinemaReservation
object
Slots can recursively contain other slots
© W. Wahlster
Overlay Operations Using the Discourse Model
Augmentation and Validation
– compare with a number of
previous discourse states:
•fill in consistent information
•compute a score
– for each hypothesis background pair:
– Overlay (covering, background)
Intention
Hypothesis
Lattice
Covering:
Background:
Selected
Augmented
Hypothesis
Sequence
© W. Wahlster
The Overlay Operation Versus the Unification Operation
• Nonmonotonic and noncommutative
unification-like operation
• Inherit (non-conflicting) background
information
• two sources of conflicts:
– conflicting atomic values
overwrite background (old) with
covering (new)
– type clash
assimilate background to the type
of covering; recursion
Unification

Overlay



cf. J. Alexandersson, T. Becker 2001
© W. Wahlster
Example for Overlay
User:
"What films are on TV tonight?"
System:
[presents list of films]
User:
"That‘s a boring program, I‘d rather
go to the movies."
How do we inherit “tonight” ?
© W. Wahlster
Domain Model: A Type Hierarchy of TFS
Entertainment 
beginTim e: T ime
...

title : String

Broadcast

channel: Channel
...

A named TV
program at
some time on
some channel
A named
entertainment at
some time
e
Performanc

cinem a: Cinema
...

A named Movie
at some time at
some cinema
© W. Wahlster
Unification Simulation
Entertainment 
beginTim e: T ime
...

title : String

Broadcast

channel: Channel
...

e
Performanc

cinem a: Cinema
...

Films on TV tonight
Broadcast


Tim e

beginTim
e
:

" tonight" 
channel: any

...

Fail – type clash
e
Performanc


T ime  
beginTim
e
:

" tonight" 
cinem a: any

...

© W. Wahlster
Overlay Simulation
Entertainment 
beginTim e: T ime
...

title : String

Broadcast

channel: Channel
...

Films on TV tonight
Broadcast


Tim e

beginTim
e
:

" tonight" 
channel: any

...

Assimilation
e
Performanc

cinem a: Cinema
e
P erformanc

...


 Background
Tim
e


beginTim e: " tonight" 
Go

... to the movies
e
Performanc


T ime  
beginTim
e
:
e
Performanc


" tonight" 


beginTim
e
:
T
ime


cinem a: any

cinem a: any



...
 Covering ...
© W. Wahlster
"Formal" Definition Overlay
• Let
T
– co be covering
– bg be background
• Step 1:
– Assimilate(co,bg)
• Step 2:
bg
– Overlay(co,assimilate(co,bg))
•
•
•
•
If co and bg are frames: recursion
If co is empty: use bg
If bg is empty: use co
If conflict: use co
co
© W. Wahlster
Domain Models with Multiple Inheritance
T
• Assimilate(co,bg)
– Compute the set of minimal
upper bounds (MUB)
– Specialize the MUBs
– Unify the specialized MUBs
MUB
MUB
co
bg
• Overlay remains untouched
© W. Wahlster
Overlay - Scoring
• Four fundamental scoring parameters:
– Number of features from Covering (co)
– Number of features from Background (bg)
– Number of type clashes (tc)
– Number of conflicting atomic values (cv)
(co  bg)  (tc  cv)
score(co, bg, tc, cv) 
(co  bg)  (tc  cv)
Codomain [-1,1]
Higher score indicates better fit (1  overlay(c,b)  unify(c,b))
© W. Wahlster
Example: Enrichment and Validation
U4: What’s on TV tonight?
S5: [Displays a list of films] Here you see a list of films running tonight.
U6: That seems not very interesting, show me the cinema program.
Analysis of U4:
 Broadcast







day : daydeictic: tonight 


analysis: instant: 



daytim e: evening







beginTim e: tim e:








from
:
2001

10

26
T
18
:
00
:
00



 function: between:









to : 2001 10  26T 23 : 59 : 00     





avMedium:  

channel: any



Discourse context




MotionDire
ctedTransl
iterated


m eansOfTransportation : car 


Town
 

target
:
nam e: Heidelberg 


 

 Broadcast







day : daydeictic: tonight 

analysis
:
instant
:










daytim e: evening








beginTim e: tim e:






 function: between:  from : 2001 10  26T 18 : 00 : 00    








to : 2001 10  26T 23 : 59 : 00     





avMedium:  

...



© W. Wahlster
Example: Enrichment and Validation
U4: What’s on TV tonight?
S5: [Displays a list of films] Here you see a list of films running tonight.
U6: That seems not very interesting, show me the cinema program.
Analysis of U6:
 Perform ance 
beginTim e:  


avMedium:  


...


Overlay ( U6, U4)
Discourse context




MotionDire
ctedTransl
iterated


m eansOfTransportation : car 


Town
 

target
:
nam e: Heidelberg 


 

Result: (Score: 0.8666)
 Broadcast







day : daydeictic: tonight 

analysis
:
instant
:










daytim e: evening








beginTim e: tim e:






 function: between:  from : 2001 10  26T 18 : 00 : 00    








to : 2001 10  26T 23 : 59 : 00     





avMedium:  

...



 Perform ance











day
:
daydeictic
:
tonight





analysis: instant: 




daytim e: evening



beginTim e: tim e: 








from
:
2001

10

26
T
18
:
00
:
00



 function: between:



to : 2001 10  26T 23 : 59 : 00     












avMedium
:


cinem a: any



© W. Wahlster
Example: Enrichment and Validation
U4: What’s on TV tonight?
S5: [Displays a list of films] Here you see a list of films running tonight.
U6: That seems not very interesting, show me the cinema program.
Analysis of U6:
 Perform ance 
beginTim e:  


avMedium:  


...


Overlay ( U6, U2)
Discourse context




MotionDire
ctedTransl
iterated


m eansOfTransportation : car 


Town
 

target
:
nam e: Heidelberg 


 

Result: (Score: -1)
 Broadcast







day : daydeictic: tonight 

analysis
:
instant
:










daytim e: evening








beginTim e: tim e:






 function: between:  from : 2001 10  26T 18 : 00 : 00    








to : 2001 10  26T 23 : 59 : 00     





avMedium:  

...



 Perform ance 
beginTim e:  


avMedium:  


...

© W. Wahlster
Animation of Scoring Parameters
Background
Broadcast







day : daydeictic: tonight 


analysis: instant: 



daytim
e
:
evening






beginTim e: tim e: 





 from : 2001 10  26T 18 : 00 : 00    



function: between: 





to
:
2001

10

26
T
23
:
59
:
00










avMedium
:


channel: any



Covering
e 
P erformanc
beginTim e:  


avMedium:  


...

Number of features from Covering (co)
2
Number of features from Background (bg) 12
Number of type clashes (tc)
1
Number of conflicting atomic values (cv)
0
Result:
(2  12)  (1  0)
score(co, bg, tc, cv) 
 0.8666
(2  12)  (1  0)
© W. Wahlster
The High-Level Control Flow of SmartKom
© W. Wahlster
M3L Specification of a Presentation Task
<presentationTask>
<subTask>
<presentationGoal>
<inform> ... </inform>
<abstractPresentationContent>
...
<result>
<broadcast id="bc1">
<channel> <name>EuroSport</name> </channel>
<beginTime>
<time> <at>2000-12-05T14:00:00</at> </time>
</beginTime>
<endTime>
<time> <at>2000-12-05T15:00:00</at> </time>
</endTime>
<avMedium>
<title>Sport News</title>
<avType>sport</avType>
...
</abstractPresentationContent>
<interactionMode>leanForward</interactionMode>
<goalID>APGOAL3000</goalID>
<source>generatorAction</source>
<realizationType>GraphicsAndSpeech</realizationType>
© W. Wahlster
SmartKom‘s Presentation Planner
The Presentation Planner generates a Presentation Plan by
applying a set of Presentation Strategies to the Presentation Goal.
GlobalPresent
Present
AddSmartakus
...
....
DoLayout
PersonaAction
EvaluatePersonaNode
...
Inform
...
Speak
SendScreenCommand
Smartakus
Actions
TryToPresentTVOverview
ShowTVOverview
ShowTVOverview
SetLayoutData
...
SetLayoutData
ShowTVOverview
Generation of
Layout
GenerateText
SetLayoutData
...
SetLayoutData
cf. J. Müller, P. Poller, V. Tschernomas 2002
© W. Wahlster
Salient Characteristics of SmartKom
• Seamless integration and mutual disambiguation of multimodal input
and output on semantic and pragmatic levels
• Situated understanding of possibly imprecise, ambiguous, or incomplete multimodal input
• Context-sensitive interpretation of dialog interaction on the basis of
dynamic discourse and context models
• Adaptive generation of coordinated, cohesive and coherent
multimodal presentations
• Semi- or fully automatic completion of user-delegated tasks through
the integration of information services
• Intuitive personification of the system through a presentation agent
© W. Wahlster
Conclusions
l Various types of unification, overlay, constraint processing,
planning and ontological inferences are the fundamental
processes involved in SmartKom‘s modality fusion and fission
components.
l The key function of modality fusion is the reduction of the
overall uncertainty and the mutual disambiguation of the
various analysis results based on a three-tiered representation
of multimodal discourse.
l We have shown that a multimodal dialogue sytsem must not
only understand and represent the user‘s input, but its own
multimodal output.
© W. Wahlster
First International Conference on Perceptive &
Multimodal User Interfaces (PMUI’03)
November 5-7th, 2003 Delta Pinnacle Hotel, Vancouver,
B.C., Canada
Conference Chair
Sharon Oviatt, Oregon Health & Science Univ., USA
Program Chairs
Wolfgang Wahlster, DFKI, Germany
Mark Maybury, MITRE, USA
PMUI’03 is sponsored by ACM, and will be co-located in Vancouver
with ACM’s UIST’03. This meeting follows three successful Perceptive
User Interface Workshops (with PUI’01 held in Florida) and three
International Multimodal Interface Conferences initiated in Asia (with
ICMI’02 held in Pittsburgh).
© W. Wahlster
March 2003
ISBN 0-262-06232-1
8 x 9, 392 pp., 98 illus.
$40.00/£26.95 (CLOTH)
Edited by
Dieter Fensel,
James A. Hendler,
Henry Lieberman and
Wolfgang Wahlster
Foreword by
Tim Berners-Lee
© W. Wahlster
http://smartkom.dfki.de/