Automatic Analysis of Facial Expressions

Download Report

Transcript Automatic Analysis of Facial Expressions

USI module U1-5
Multimodal interaction
Jacques Terken
USI module U1, lecture 5
Contents
• Demos and video clips
• Multimodal behaviour
• Multimodal interaction, architecture and multimodal
fusion
• Design heuristics, guidelines and tools
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
2
• http://www.nuance.com/xmode/demo/#
• http://www.csee.ogi.edu/CHCC/ (Video Quickset)
• RASA (combination of tangible and multimodal
interaction)
• May be also of interest
– http://www.gvu.gatech.edu/gvu/events/demodays/2001/demos010930.html
– http://ligwww.epfl.ch/~thalmann/research.html
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
3
Quickset ipaq (ogi – chcc)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
4
Multimodal behaviour
• Development of multimodal systems dependent on
knowledge about natural integration patterns that are
characteristic for the combined use of different
modalities
• Dealing with myths about multimodal interaction:
– Oviatt, S.L., “Ten myths of Multimodal interaction”,
Communications of the ACM 42(11), 1999, pp.74-81
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
5
Myth 1: If you build a multimodal system, users will
interact multimodally.
Dependent on domain:
• Spatial domain: 95-100% of the users have a
preference for multimodal interaction;
• Other domains: 20% of the commands are
multimodal
Dependent on type of action:
• High MM: adding, moving, modifying objects,
calculating distance between objects
• Low MM: printing, scrolling etc.
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
6
Multi-Modal Interaction (0H640)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
7
Multi-Modal Interaction (0H640)
• Distinction between general, selective and spatial
actions
• General: non-object-directed actions (printing etc.)
• Selective: choosing objects
• Spatial: manipulation of objects ( adding etc.)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
8
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
9
myth 2: Speech & pointing is the dominant multimodal
integration pattern.
• Central in Bolt’s speak-and-point interface (“put that
there”
• Speak-and-point includes only 14% of spontaneous
multimodal actions
• In human communication pointing accounts for appr.
20% of all gestures
• Other actions: handwriting, hand gestures, facial
expressions (“Rich” interaction)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
10
Multi-Modal Interaction (0H640)
myth 3: Multimodal input involves simultaneous signals.
• Information
from different
modalities is
often
sequential
• Often
gestures
precede
speech
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
11
Multi-Modal Interaction (0H640)
myth 4: Speech is the primary input mode in any
multimodal system that includes it,
and gestures, head and body movement, gaze
direction and other input are secondary
• Often speech cannot contain all information (cf.
combination of pen + speech)
• Gestures are better for some kinds of information
• Often gestures indicate the context for speech
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
12
Multi-Modal Interaction (0H640)
myth 5: Multimodal language does not differ
linguistically from unimodal language.
• Users often avoid complicated commands in
multimodal interaction
• Multimodal language is often shorter, syntactically
more simple, and more fluent
– Unimodal: “place a boat dock on the east, no,
west end of reward lake”
– Multimodal: [draws rectangle] “add rectangle”
• Multimodal language more easy to process
– Less anaphora and indirectness
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
13
Multi-Modal Interaction (0H640)
myth 6: Multimodal integration involves redundancy of
content between modes.
• Different modalities contribute complementary
information:
– Speech: subject, object, verb (objects,
actions/operations):
– Gesture: Location (spatial info)
• Even in the case of correction only 1% redundancy
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
14
Multi-Modal Interaction (0H640)
myth 7: Individual error-prone recognition technologies
combine multimodally to produce even greater
unreliability.
• Combination of inputs enables mutual disambiguation
• Users choose the least error-prone modality
(“leveraging from users’ natural intelligence about
when and how to deploy input modes effectively”)
• Combination of error-prone modalities gives in fact a
more stable system
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
15
Multi-Modal Interaction (0H640)
myth 8: All users’ multimodal commands are integrated
in a uniform way.
• Differences between people
• Consistent use within people
• Advance detection of integration pattern can result in
better recognition
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
16
Multi-Modal Interaction (0H640)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
17
myth 9: Different input modes are capable of
transmitting comparable content (alt-mode
hypothesis).
• Differences between modalities:
– Type of information
– Functionality during communication
– Accuracy of expression
– Manner of integration with other modalities
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
18
Multi-Modal Interaction (0H640)
myth 10: Enhanced speed and efficiency are the main
advantages of multimodal systems.
Applies indeed (to a limited extent) for the spatial domain:
• In multimodal pen/speech interaction speed increase
with app. 10%
More important advantages in other domains:
• Decrease in errors an non-fluent speech with 35-50%
• Possibility of choice of input:
– Less chance of fatigue per modality
– Better opportunities for repair
– Larger range of users
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
19
Multi-Modal Interaction (0H640)
Advantages: Robustness
• Individual signal processing technologies error-prone
• Integration of complementary modalities to yield
synergy, capitalizing on the strength of each modality
and overcoming weaknesses in the other
– Users will select the input mode that they consider
less error prone for particular lexical content
– User’s language is simplified when interacting
multimodally
– Users tend to switch modes after system errors,
facilitating error recovery
– Users report less frustration when interacting
multimodally (greater sense of control)
– Mutual compensation/disambiguation
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
20
Technologies: Types of multimodality
W3C (see http://www.w3.org/TR/mmi-reqs/ ):
Seen from the perspective of the system (how the input is
handled)
• Sequential multimodal input
Modality A for action a, next Modality B for action b, each
event handled as a separate event
• Simultaneous (Uncoordinated) multimodal input
Each event handled as a separate event. Choice between
different modalities at each moment in time
• Composite (coordinated simultaneous) multimodal input
Events integrated into a single event before interpretation.
(“true” multimodality)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
21
Coutaz & Nigay
Non-coordinated
(W3C: supplementary)
Coordinated
(W3C: complementary)
Sequential
Simultaneous
Exclusive
Concurrent
(W3C: Sequential)
(W3C: simultaneous)
alternate
Synergistic
(W3C: composite)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
22
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
23
Mutual disambiguation (MD)
•
•
•
•
Speech input: n-best list
1. Ditch
2. Ditches
Gestural input
Joint interpretation:
1. Ditches
Benefit may be dependent on situation (e.g. larger
for non-native speakers)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
24
Early fusion
• Closely coupled and synchronized modalities such as
speech and lip movements
• “Feature level” fusion
• Based on multiple Hidden Markov Models or
temporal neural networks. Correlation structure
between modes can be taken into account
automatically via learning
• Problems: modelling complexity, computational
intensity, training difficulty
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
25
Late fusion
•
•
•
•
“Semantic level” fusion
Individual recognizers
Sequential integration
Advantages: scalable – individual recognizers don’t
need to be retrained
• Early approaches: multimodal command’s posterior
probability is the cross-product of the posterior
probabilities of the associated constituents
 No advantage taken from mutual compensation
phenomenon
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
26
Architectural requirements for
late semantic fusion
• Fine-grained timestamping
• Sequentially-integrated or simultaneously delivered
• Common representational format for different
modalities
• Frame based (multimodal fusion through unification
of feature-structures)
 Mutual disambiguation
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
27
Unification
utterance
gesture
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
28
Design of multimodal interfaces
1. Task analysis
What are the actions that need to be performed?
2. Task allocation
What party is the most suitable candidate for
performing particular actions?
3. Modality allocation
What modality or combination of modalities is most
suited to perform particular actions?
Current presentation focuses on 3
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
29
Definition of ‘modality’
• Modality as sensory channel
However, stating that particular numeric information
should be presented in the visual modality provides
little grip
• Hence, the notion of ‘representational modality’ has
been proposed (Bernsen), which distinguishes e.g.
table and graph as two different modalities
• For the time being, we use ‘modality’ in the more
restricted sense of sensory channel, and look for
mappings between actions and modalities
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
30
Relevant dimensions
•
•
•
•
•
•
Nature of the information
Interaction paradigm
Physical and dialogue context
Platform
Accessibility
Multitasking
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
31
Rules of thumb, heuristics
•
•
•
•
•
Michaelis and Wiggins (1982)
Cohen and Oviatt (1994)
Suhm (2000)
Larsson (2003)
Reeves, Lai et al. (2004)
• For references see
Terken J. “Guidelines and Tools for the Design of
Multimodal Interfaces”, Workshop ASIDE2005,
Aalborg (DK)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
32
Michaelis and Wiggins (1982)
• Speech generation is preferable when the
– message is short.
– message will not be referred to later.
– messages deal with events in time.
– message requires an immediate response.
– visual channels of communication are overloaded.
– environment is too brightly lit, too poorly lit, subject to severe
vibration, or otherwise unsuitable for transmission of visual
information.
– user must be free to move around.
– user is subjected to high G forces or anoxia.
• Tentative guidelines for when NOT to use speech may be
derived from these suggestions through negation.
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
33
Cohen and Oviatt (1994)
• spoken communication with machines (both input
and output) may be advantageous:
– when the user’s hands or eyes are busy
– when only limited keyboard and/or screen is
available
– when the user is disabled
– when pronunciation is the subject matter of
computer use
– when natural language interaction is preferred
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
34
Suhm (2000)
Principles for choosing the set of modalities
2. Consider speech input for entry of textual data, dialogue-oriented
tasks, and command control. Speech input is generally less
efficient for navigation, manipulation of image data. and resolution
of object references.
3. Consider written input for corrections, entry of digits, and entry of
graphical data (formulas, sketches, etc.)
4. Consider gesture input for indicating scope or type of commands,
for resolving deictic object references
5. Consider the traditional modalities (keyboard and mouse input) as
alternative, unless superiority of novel modalities (speech, pen
input) is proven.
• Principles to circumvent limitations of recognition technology
• Principles for the implementation of Pen-Speech Interfaces
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
35
Larsson (2003)
• Satisfy Real-world Constraints
– Task-oriented Guidelines
– Physical Guidelines
– Environmental Guidelines
• Communicate Clearly, Concisely, and Consistently with Users
– Consistency Guidelines
– Organizational Guidelines
• Help Users Recover Quickly and Efficiently from Errors
– Conversational Guidelines
– Reliability Guidelines
• Make Users Comfortable
– System Status
– Human-memory Constraints
– Social Guidelines
– …
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
36
Reeves, Lai et al. (2004)
Propose a set of multimodal design principles that are
founded in perception and cognition science (but
motivation remains implicit)
Four general areas
• Designing multimodal input and output
• Adaptivity
• Consistency
• Feedback
• Error prevention/handling
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
37
Designing Multimodal Input and Output
• Maximize human cognitive and physical abilities. Designers
need to determine how to support intuitive, streamlined
interactions based on users' human information processing
abilities (including attention, working memory, and decision
making) for example:
– Avoid unnecessarily presenting information in two different
modalities in cases where the user must simultaneously
attend to both sources to comprehend the material being
presented; such redundancy can increase cognitive load at
the cost of learning the material.
– Maximize the advantages of each modality to reduce user's
memory load in certain tasks and situations;
– System visual presentation coupled with user manual input
for spatial information and parallel processing;
– System auditory presentation coupled with user speech input
for state information, serial processing, attention alerting, or
issuing commands.
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
38
• Integrate modalities in a manner compatible with user preferences,
context, and system functionality. Additional modalities should be
added to the system only if they improve satisfaction, efficiency, or
other aspects of performance for a given user and context. When
using multiple modalities:
– Match output to acceptable user input style (for example, if the
user is constrained by a set grammar, do not design a virtual
agent to use unconstrained natural language);
– Use multimodal cues to improve collaborative speech (for
example, a virtual agent's gaze direction or gesture can guide
user turn-taking);
– Ensure system output modalities are well synchronized
temporally (for example, map-based display and spoken
directions, or virtual display and non-speech audio);
– Ensure that the current system interaction state is shared across
modalities and that appropriate information is displayed in order
to support:
• Users in choosing alternative interaction modalities;
• Multidevice and distributed interaction;
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
39
3. Theoretical approaches
• Modality theory (Bernsen c.s.)
‘Modality’ defined as ‘representational modality’
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
40
Modality theory (Bernsen)
Aim
• Given any particular class of task domain information which
needs to be exchanged between user and system during task
performance, identify the set of input/output modalities which
constitute an optimal solution to the representation and
exchange of that information (Bernsen, 2001).
• Taxonomic analyses:
– (representational) Input and output modalities are
characterized in terms of a limited number basic features
such as
– linguistic/nonlinguistic,
– analogue/non-analogue,
– arbitrary/nonarbitrary,
– static/dynamic.
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
41
• Modality properties can then be applied according to
the following procedure:
1.
Requirements Specification >
2.
Modality Properties + Natural Intelligence >
3.
Advice/Insight with respect to modality choice.
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
42
• [MP1] Linguistic input/output modalities have interpretational
scope, which makes them eminently suited for conveying
abstract information. They are therefore unsuited for conveying
high-specificity information including detailed information on
spatial manipulation and location.
• [MP2] Linguistic input/output modalities, being unsuited for
specifying detailed information on spatial manipulation, lack an
adequate vocabulary for describing the manipulations.
• [MP3] Arbitrary input/output modalities impose a learning
overhead which increases with the number of arbitrary items to
be learned.
• [MP4] Acoustic input/output modalities are omnidirectional.
• [MP5] Acoustic input/output modalities do not require limb
(including haptic) or visual activity.
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
43
4. Tools
• SMALTO (Bernsen)
• Multimodal property flowchart (Williams et al., 2002)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
45
SMALTO
• Addresses the “Speech functionality problem”:
• SMALTO has been created by taking a large number
of claims or findings from the literature on designing
speech or speech-centric interfaces and casting
these claims into the structured representation
expressing the Speech Functionality Problem
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
46
•
•
•
•
•
•
•
•
•
•
•
•
•
[Combined speech input/output, speech output, or speech input
modalities M1, M2 and/or M3 etc.] or [speech modality M1, M2 and/or
M3 etc. in combination with non-speech modalities NSM1, NSM2
and/or NSM3 etc.]
are [useful or not useful]
for [generic task: GT]
and/or ]speech act type: SA]
and/or [user group: UG]
and/or [interaction mode: IM]
and/or [work environment: WE]
and/or [generic system: GS]
and/or [performance parameter: PP]
and/or [learning parameter: LP]
and/or [cognitive property: CP]
and/or [preferable or non-preferable] to [alternative modalities AM1,
AM2 and/or AM3 etc.]
and/or [useful on conditions] C1, C2 and/or C3 etc.
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
47
• SMALTO has been evaluated within the framework of
projects involving the creators and in the DISC
project
• Informal evidence indicates that it is difficult to apply
for “linguistically naïve” designers because of the way
the modality properties are formulated
• This was also the motivation for the Modality Property
Flowchart (Williams et al. 2002)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
48
Multimodal property flowchart
pdf
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
49
• Multimodal interfaces as a particular type of
interfaces
 Multimodal property flowchart needs to be
combined with general usability heuristics for
interface design (e.g. Nielsen)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
50
Main points
• Multimodal interfaces match the natural expressivity
of human beings
• Taxonomy of multimodal interaction
• Limitations of signal processing in one modality can
be overcome by taking into consideration input from
another modality (multimodal disambiguation)
• Mapping of functionalities onto modalities not always
straightforward
 support from guidelines and tools
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
51