Διαφάνεια 1

Transcript Διαφάνεια 1

Work-Package 5:
Multimodal Processing and Interaction
E-TEAMS Overview
Leaders:
Petros Maragos, ICCS-NTUA
Alexandros Potamianos, TSI-TUC
WP5 Outline: Description of Work in JPA3
 T1. Book on Multimodal Processing and Interaction
 T2. Audio-Visual Speech Analysis and Recognition
 T2.1 Audio-Visual Feature Extraction and Fusion
 T2.2: Dynamic Models for AV-ASR, Evaluation
 T2.3: Audio-Visual to Articulatory Speech Inversion
 T3. Multimodal Integration for MM Analysis & Recognition
 T3.1: Video Analysis & Integration of Asynchronous Time-evolving
Modalities
 T3.2: Multimodal Saliency
 T3.3: Integrated Multimedia Content Analysis
 T4. Interfaces to Multimedia
 T4.1: Multimodal Dialogue Interfaces
 T4.2: Eye-tracking Interfaces for Information Retrieval
 T4.3: Mobile Interfaces
 T5. Coordination of research and Dissemination of results
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
e-Teams: Goals & Objectives
 E-Team 10: Audio-Visual Speech Analysis & Recognition
 AV Feature Extraction and Feature Fusion
 Dynamical Models for AV-ASR, Evaluation
 Audio-Visual to Articulatory Speech Inversion.
 E-Team 11: Multimodal Processing & Multimedia Understanding
 Video Analysis and Integration of Asynchronous Time-evolving Modalities
 Audio-Visual Attention Modeling and Salient Event Detection
 Integrated Multimedia Content Analysis
 E-Team 12: Multimodal Interfaces
 Multimodal Recognition and Dialogue Systems
 Mobile Services
 Novel Interfaces (Eye-tracking)
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
e-Team 10: AV Speech Analysis & Recogn.
 Partners







P. Maragos, G. Panandreou, A. Katsamanis, V. Pitsikalis (ICCS-NTUA)
Alex Potamianos (TSI-TUC)
Khalid Daoudi, Eduardo Sanchez-Soto (IRIT)
Yves Laprie (INRIA-Parole)
Guillaume Gravier, Patrick Gros (INRIA-Texmex)
Costas Kotropoulos, N. Nikolaidis, I. Pitas (AUTH)
Ron Kimmel (Technion)
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
e-Team 10: AV Speech Analysis & Recogn.
 Research areas include:
 Active-Appearance (and other Deformable) Models and Statistical
Approaches for Face (or only mouth area) detection, modelling and
feature extraction
 Nonlinear Speech Modelling for better audio & articulatory feature
extraction
 A-V Feature Fusion
 Audio-visual to Articulatory Speech Inversion
 Application areas include:
 Audio-Visual Automatic Speech Recognition (including Lip-Reading)
 Collection of AV Databases and Evaluations
 Applications of AV articulatory Speech Inversion.
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
e-Team 10: AV Speech Analysis & Recogn.
 The main goals of e-team10 are
 Goal 1: Contribute to the Update of the State-of-Art Surveys of
the WP5 MUSCLE Book
 Goal 2: Co-Author New Research Chapters of the WP5
MUSCLE Book
 Goal 3: Co-author conference and journal Papers on some focus
theme with multiple MUSCLE partners (improve integration)
 Goal 4: Collaboration on common research agendas for AV-ASR
and AV speech inversion
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
e-Team 10: AV Speech Analysis & Recogn.
 Recent Work
 Audio-Visual Speech Recognition (TUC, NTUA)
 Multimodal Feature Fusion (TUC, IRIT, NTUA)
 Audio-Visual Speech Inversion (INRIA-Parole, NTUA, KTHSpeech)
 Contribution to MUSCLE Book
 AV-ASR showcase proposal
 Future Plans
 Continued collaboration in aforementioned research areas
 Book project: first draft by June
 Workshop in Athens: April 2007 (joint with e-team 11,12)
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
e-Team 11: Multimodal Proc. & Understanding
 Partners








P. Maragos, G. Evangelopoulos, K. Rapantzikos, S. Kollias (NTUA)
Patrick Gros, Ewa Kijak, Guillaume Gravier (INRIA-Texmex)
Costas Kotropoulos, N. Nikolaidis, I. Pitas (AUTH)
Andreas Rauber (TU Wien)
Alex Potamianos (TUC)
Sanni Siltanen (VTT)
Fred Stentiford, Wole Oyekoya (UCL)
Enis Cetin (Bilkent)
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
e-Team 11: Multimodal Proc. & Understanding
 Research areas include:
 Stochastic modeling with several data streams / several temporal rates /
weakly synchronized data
 Audio-Visual Cooperative Feature Extraction and Salient Event
Detection
 Audio-visual Dialogue Understanding
 Image + Text Integration
 Audio + Text integration
 Application areas include:
 Understand (= structure) TV and other MM documents, and Prepare
these documents for applications (repurposing, archiving)
 Event Detection and Segmentation in Sports videos
 Salient Event Detection and Dialogue Detection in Movies videos
 Speech Transcription and NLP
 Music genre analysis and music retrieval
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
e-Team 11: Multimodal Proc. & Understanding
 The main goals of e-team11 are:
 Goal 1: Contribute to the Update of the State-of-Art Surveys of
the WP5 MUSCLE Book.
 Goal 2: Co-Author New Research Chapters of the WP5
MUSCLE Book.
 Goal 3: Co-author conference and journal Papers on some focus
theme with multiple MUSCLE partners (improve integration).
 Goal 4: Collaboration on a common research agenda for
multimodal feature fusion, saliency detection and multimodal
processing.
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
e-Team 11: Multimodal Proc. & Understanding
 Recent Work
 Annotated Movie Information Database (AUTH)
 Audio-Visual Saliency Detection (AUTH, INRIA-Texmex,
NTUA, TUC)
 Contribution to MUSCLE Book (NTUA, TUC, AUTH, INRIATexMex, TUWien, Bilkent)
 Movie summarization showcase proposal
 Future Plans
 Closer collaboration between partners on common movie DB
 Book project: first draft by June
 Workshop in Athens: April 2007 (joint with e-team 11,12)
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
e-Team 12: Multimodal Interfaces
 Partners:





Alex Potamianos, Manolis Perakakis, Michalis Toutoudakis, TUC
Petros Maragos, Nassos Katsamanis, George Panandreou, NTUA
Sanni Siltanen, Santtu Toivonen, VTT
Fred Stentiford, UCL
Ugur Gudukbay, Ozgur Ulusoy, Enis Cetin, Yigithan Dedeoglu, Serkan
Genc, Bilkent University
 Costas Kotropoulos, AUTH
 Andreas Rauber, TU Wien
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
e-Team 12: Multimodal Interfaces
 Research areas:







multimodality
annotation of multimedia databases
search
interface efficiency
eye-tracking interfaces
speech interfaces
mobile interfaces
 Application areas:





search/information retrieval on image and video databases
search/information rertieval on the web
information-seeking spoken dialogue systems
mobile services portal/applications
search/information retrieval for audio data
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
e-Team 12: Multimodal Interfaces
 The main goals of e-team 12 are:
 Goal 1: Contribute to the Update of the State-of-Art Surveys of
the WP5 MUSCLE Book.
 Goal 2: Co-Author New Research Chapters of the WP5
MUSCLE Book.
 Goal 3: Co-author conference and journal Papers on some focus
theme with multiple MUSCLE partners (improve integration).
 Goal 4: Collaboration on a common research agenda for
multimodal feature fusion, saliency detection and multimodal
processing.
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
e-Team 12: Multimodal Interfaces
 Recent Work
 Multimodal Spoken Interfaces (TUC, NTUA).
 Mobile Interfaces (TUC, VTT)
 Contribution to MUSCLE Book (TUC, UCL, VTT)
 “Augmented assembly using a multimodal interface” showcase
proposal
 Future Plans
 Improve integration/collaboration between partners
 Book project: first draft by June
 Workshop in Athens: April 2007 (joint with e-team 11,12)
WP5-e-Teams
MUSCLE Plenary, Dec. 2006, France
MUSCLE
BOOK
 Title: Multimodal Processing and Interaction: Audio, Video, Text
 Contents:
 State-of-Art Reviews of WP6 + WP10 (updated)
 Contributed Research Chapters: New Work
 Agenda:
 Scope and Thematic Areas discussed during Audio-Conf & Meetings
 Each interested participant emails preliminary title + abstract
 Table-of-Contents of selected chapters is discussed with all participants
 Publisher is contacted
WP5
MUSCLE Plenary, Dec. 2006, France
MUSCLE
Multimodal Processing and Interaction:
Audio, Video, Text
 PART I: Review of the State-of-the-Art
 Cross-Modal Integration for Performance Improving in
Multimedia: State-of-the-Art Review
 Human-Computer Interfaces for Multimedia Retrieval:
State-of-the-Art Review
 PART II: New Research Directions
 Integrated Multimedia Analysis and Recognition
1. Stochastic Models for Multimodal Video Analysis
2. Adaptive Multimodal Fusion by Uncertainty Compensation
with Application to Audiovisual Speech Recognition
3. Movie Analysis with Emphasis to Dialogue Detections
4. Using HMM for Action Recognition in Audio-Visual streams
5. Surveillance Using Both Video and Audio
6. Audiovisual Attention Modeling and Salient Event Detection
WP5
MUSCLE Plenary, Dec. 2006, France
MUSCLE
Multimodal Processing and Interaction:
Audio, Video, Text
 PART II (cont.): New Research Directions
 Searching Multimedia Content
1. Interactive Image Retrieval using a Hybrid Visual and
Conceptual Content Representation
2. Multi-Modal Analysis of Text and Audio Features for Music
Information Retrieval
3. Toward the Integration of NLP and ASR: POS Tagging and
Transcription
 Interfaces to Multimedia Content
1. Design Principles for Multimodal Spoken Dialogue Systems
2. Eye Tracking for Image Retrieval
3. Natural/ Novel User Interfaces for Mobile Devices
WP5
MUSCLE Plenary, Dec. 2006, France
MUSCLE
WP5 e-Team Scientific Talks
 WP5 e-team 10 scientific talk: "Stream weight
computation for Audio-Visual Speech Recognition", by
Eduardo Sanchez-Soto, IRIT (duration 15‘)
 WP5 e-team 11 scientific talk: "Dialogue Detecion in
Movies", by D. Ververidis, AUTH (duration 15')
 WP5 e-team 12 scientific talk: "Augmented reality
visualization: Construncting the mobile user interface",
by Sanni Siltanen, VTT (duration 15')
WP5
MUSCLE Plenary, Dec. 2006, France
MUSCLE
WP5 Scientific Talks (FRIDAY)
 WP 5 scientific talk: "Multimodal Fusion: Application to
AV-ASR and AV Speech Inversion", by
George
Papandreou, NTUA (duration 15')
 WP 5 scientific talk: "A Natural Language Interface for a
Video Database Management System", by
Ugur
Gudukbay, Bilkent U. (duration 15')
 WP 5 scientific talk: " Modality selection in Multimodal
Dialogue Systems", by Alex Potamianos, TUC (duration
15')
WP5
MUSCLE Plenary, Dec. 2006, France
MUSCLE