Transcript Διαφάνεια 1
Work-Package 5:
Multimodal Processing and Interaction
E-TEAMS Overview
Leaders:
Petros Maragos, ICCS-NTUA
Alexandros Potamianos, TSI-TUC
WP5 Outline: Description of Work in JPA3
T1. Book on Multimodal Processing and Interaction
T2. Audio-Visual Speech Analysis and Recognition
T2.1 Audio-Visual Feature Extraction and Fusion
T2.2: Dynamic Models for AV-ASR, Evaluation
T2.3: Audio-Visual to Articulatory Speech Inversion
T3. Multimodal Integration for MM Analysis & Recognition
T3.1: Video Analysis & Integration of Asynchronous Time-evolving
Modalities
T3.2: Multimodal Saliency
T3.3: Integrated Multimedia Content Analysis
T4. Interfaces to Multimedia
T4.1: Multimodal Dialogue Interfaces
T4.2: Eye-tracking Interfaces for Information Retrieval
T4.3: Mobile Interfaces
T5. Coordination of research and Dissemination of results
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
e-Teams: Goals & Objectives
E-Team 10: Audio-Visual Speech Analysis & Recognition
AV Feature Extraction and Feature Fusion
Dynamical Models for AV-ASR, Evaluation
Audio-Visual to Articulatory Speech Inversion.
E-Team 11: Multimodal Processing & Multimedia Understanding
Video Analysis and Integration of Asynchronous Time-evolving Modalities
Audio-Visual Attention Modeling and Salient Event Detection
Integrated Multimedia Content Analysis
E-Team 12: Multimodal Interfaces
Multimodal Recognition and Dialogue Systems
Mobile Services
Novel Interfaces (Eye-tracking)
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
e-Team 10: AV Speech Analysis & Recogn.
Partners
P. Maragos, G. Panandreou, A. Katsamanis, V. Pitsikalis (ICCS-NTUA)
Alex Potamianos (TSI-TUC)
Khalid Daoudi, Eduardo Sanchez-Soto (IRIT)
Yves Laprie (INRIA-Parole)
Guillaume Gravier, Patrick Gros (INRIA-Texmex)
Costas Kotropoulos, N. Nikolaidis, I. Pitas (AUTH)
Ron Kimmel (Technion)
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
e-Team 10: AV Speech Analysis & Recogn.
Research areas include:
Active-Appearance (and other Deformable) Models and Statistical
Approaches for Face (or only mouth area) detection, modelling and
feature extraction
Nonlinear Speech Modelling for better audio & articulatory feature
extraction
A-V Feature Fusion
Audio-visual to Articulatory Speech Inversion
Application areas include:
Audio-Visual Automatic Speech Recognition (including Lip-Reading)
Collection of AV Databases and Evaluations
Applications of AV articulatory Speech Inversion.
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
e-Team 10: AV Speech Analysis & Recogn.
The main goals of e-team10 are
Goal 1: Contribute to the Update of the State-of-Art Surveys of
the WP5 MUSCLE Book
Goal 2: Co-Author New Research Chapters of the WP5
MUSCLE Book
Goal 3: Co-author conference and journal Papers on some focus
theme with multiple MUSCLE partners (improve integration)
Goal 4: Collaboration on common research agendas for AV-ASR
and AV speech inversion
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
e-Team 10: AV Speech Analysis & Recogn.
Recent Work
Audio-Visual Speech Recognition (TUC, NTUA)
Multimodal Feature Fusion (TUC, IRIT, NTUA)
Audio-Visual Speech Inversion (INRIA-Parole, NTUA, KTHSpeech)
Contribution to MUSCLE Book
AV-ASR showcase proposal
Future Plans
Continued collaboration in aforementioned research areas
Book project: first draft by June
Workshop in Athens: April 2007 (joint with e-team 11,12)
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
e-Team 11: Multimodal Proc. & Understanding
Partners
P. Maragos, G. Evangelopoulos, K. Rapantzikos, S. Kollias (NTUA)
Patrick Gros, Ewa Kijak, Guillaume Gravier (INRIA-Texmex)
Costas Kotropoulos, N. Nikolaidis, I. Pitas (AUTH)
Andreas Rauber (TU Wien)
Alex Potamianos (TUC)
Sanni Siltanen (VTT)
Fred Stentiford, Wole Oyekoya (UCL)
Enis Cetin (Bilkent)
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
e-Team 11: Multimodal Proc. & Understanding
Research areas include:
Stochastic modeling with several data streams / several temporal rates /
weakly synchronized data
Audio-Visual Cooperative Feature Extraction and Salient Event
Detection
Audio-visual Dialogue Understanding
Image + Text Integration
Audio + Text integration
Application areas include:
Understand (= structure) TV and other MM documents, and Prepare
these documents for applications (repurposing, archiving)
Event Detection and Segmentation in Sports videos
Salient Event Detection and Dialogue Detection in Movies videos
Speech Transcription and NLP
Music genre analysis and music retrieval
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
e-Team 11: Multimodal Proc. & Understanding
The main goals of e-team11 are:
Goal 1: Contribute to the Update of the State-of-Art Surveys of
the WP5 MUSCLE Book.
Goal 2: Co-Author New Research Chapters of the WP5
MUSCLE Book.
Goal 3: Co-author conference and journal Papers on some focus
theme with multiple MUSCLE partners (improve integration).
Goal 4: Collaboration on a common research agenda for
multimodal feature fusion, saliency detection and multimodal
processing.
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
e-Team 11: Multimodal Proc. & Understanding
Recent Work
Annotated Movie Information Database (AUTH)
Audio-Visual Saliency Detection (AUTH, INRIA-Texmex,
NTUA, TUC)
Contribution to MUSCLE Book (NTUA, TUC, AUTH, INRIATexMex, TUWien, Bilkent)
Movie summarization showcase proposal
Future Plans
Closer collaboration between partners on common movie DB
Book project: first draft by June
Workshop in Athens: April 2007 (joint with e-team 11,12)
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
e-Team 12: Multimodal Interfaces
Partners:
Alex Potamianos, Manolis Perakakis, Michalis Toutoudakis, TUC
Petros Maragos, Nassos Katsamanis, George Panandreou, NTUA
Sanni Siltanen, Santtu Toivonen, VTT
Fred Stentiford, UCL
Ugur Gudukbay, Ozgur Ulusoy, Enis Cetin, Yigithan Dedeoglu, Serkan
Genc, Bilkent University
Costas Kotropoulos, AUTH
Andreas Rauber, TU Wien
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
e-Team 12: Multimodal Interfaces
Research areas:
multimodality
annotation of multimedia databases
search
interface efficiency
eye-tracking interfaces
speech interfaces
mobile interfaces
Application areas:
search/information retrieval on image and video databases
search/information rertieval on the web
information-seeking spoken dialogue systems
mobile services portal/applications
search/information retrieval for audio data
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
e-Team 12: Multimodal Interfaces
The main goals of e-team 12 are:
Goal 1: Contribute to the Update of the State-of-Art Surveys of
the WP5 MUSCLE Book.
Goal 2: Co-Author New Research Chapters of the WP5
MUSCLE Book.
Goal 3: Co-author conference and journal Papers on some focus
theme with multiple MUSCLE partners (improve integration).
Goal 4: Collaboration on a common research agenda for
multimodal feature fusion, saliency detection and multimodal
processing.
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
e-Team 12: Multimodal Interfaces
Recent Work
Multimodal Spoken Interfaces (TUC, NTUA).
Mobile Interfaces (TUC, VTT)
Contribution to MUSCLE Book (TUC, UCL, VTT)
“Augmented assembly using a multimodal interface” showcase
proposal
Future Plans
Improve integration/collaboration between partners
Book project: first draft by June
Workshop in Athens: April 2007 (joint with e-team 11,12)
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
BOOK
Title: Multimodal Processing and Interaction: Audio, Video, Text
Contents:
State-of-Art Reviews of WP6 + WP10 (updated)
Contributed Research Chapters: New Work
Agenda:
Scope and Thematic Areas discussed during Audio-Conf & Meetings
Each interested participant emails preliminary title + abstract
Table-of-Contents of selected chapters is discussed with all participants
Publisher is contacted
WP5
MUSCLE Plenary, Dec. 2006, France
MUSCLE
Multimodal Processing and Interaction:
Audio, Video, Text
PART I: Review of the State-of-the-Art
Cross-Modal Integration for Performance Improving in
Multimedia: State-of-the-Art Review
Human-Computer Interfaces for Multimedia Retrieval:
State-of-the-Art Review
PART II: New Research Directions
Integrated Multimedia Analysis and Recognition
1. Stochastic Models for Multimodal Video Analysis
2. Adaptive Multimodal Fusion by Uncertainty Compensation
with Application to Audiovisual Speech Recognition
3. Movie Analysis with Emphasis to Dialogue Detections
4. Using HMM for Action Recognition in Audio-Visual streams
5. Surveillance Using Both Video and Audio
6. Audiovisual Attention Modeling and Salient Event Detection
WP5
MUSCLE Plenary, Dec. 2006, France
MUSCLE
Multimodal Processing and Interaction:
Audio, Video, Text
PART II (cont.): New Research Directions
Searching Multimedia Content
1. Interactive Image Retrieval using a Hybrid Visual and
Conceptual Content Representation
2. Multi-Modal Analysis of Text and Audio Features for Music
Information Retrieval
3. Toward the Integration of NLP and ASR: POS Tagging and
Transcription
Interfaces to Multimedia Content
1. Design Principles for Multimodal Spoken Dialogue Systems
2. Eye Tracking for Image Retrieval
3. Natural/ Novel User Interfaces for Mobile Devices
WP5
MUSCLE Plenary, Dec. 2006, France
MUSCLE
WP5 e-Team Scientific Talks
WP5 e-team 10 scientific talk: "Stream weight
computation for Audio-Visual Speech Recognition", by
Eduardo Sanchez-Soto, IRIT (duration 15‘)
WP5 e-team 11 scientific talk: "Dialogue Detecion in
Movies", by D. Ververidis, AUTH (duration 15')
WP5 e-team 12 scientific talk: "Augmented reality
visualization: Construncting the mobile user interface",
by Sanni Siltanen, VTT (duration 15')
WP5
MUSCLE Plenary, Dec. 2006, France
MUSCLE
WP5 Scientific Talks (FRIDAY)
WP 5 scientific talk: "Multimodal Fusion: Application to
AV-ASR and AV Speech Inversion", by
George
Papandreou, NTUA (duration 15')
WP 5 scientific talk: "A Natural Language Interface for a
Video Database Management System", by
Ugur
Gudukbay, Bilkent U. (duration 15')
WP 5 scientific talk: " Modality selection in Multimodal
Dialogue Systems", by Alex Potamianos, TUC (duration
15')
WP5
MUSCLE Plenary, Dec. 2006, France
MUSCLE