Multimedia Information for Personal and Mobile Environments © Copyright 2002 Michael G.

Download Report

Transcript Multimedia Information for Personal and Mobile Environments © Copyright 2002 Michael G.

Multimedia Information for
Personal and Mobile Environments
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
1
Carnegie Mellon
Outline
•
•
•
•
•
•
Technological Background
Informedia Digital Video Library
Digital Human Memory
Information in Mobile Environments
Prototype Demonstration
Conclusions
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
2
Carnegie Mellon
Technological Impetus
•
•
•
•
•
•
•
Standardized digital formats
Increased bandwidth and interconnectivity
Faster and cheaper computers
Efficient data compression
Strong digital security
Ubiquitous mobility of data
Inexpensive data storage
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
3
Carnegie Mellon
Media and Information Convergence
Television
Radio
Print
Telephony
Internet
Shared Media
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
4
Carnegie Mellon
Informedia System Overview
Library Exploration
Library Creation
Online
Offline
Video
Audio
Text
Digital Compression
Speech
Recognition
Spoken
Natural
Language
Query
Image
Natural Language
Extraction
Interpretation
Story
Choices
Segmentation
Indexed Database
Segmented
Indexed
Transcript Compressed
Audio/Video
SemanticExpansion
DISTRIBUTION
TO USERS
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
7
Requested
Segment
Indexed Database
Segmented
Indexed
Transcript Compressed
Audio/Video
Carnegie Mellon
Digital Human Memory
–
Technology for creating a continuously recorded,
digital, high fidelity record of one’s whole life in video
form
–
Personal, wearable units which record audio, video,
GPS and electronic communications; capturing all that
is heard, seen & experienced
–
Transforming this personal history into a meaningful,
accessible information resource with auto-search and
auto-summarization
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
8
Carnegie Mellon
Overview of Human Experience Capture
GPS
Video
Biometrics
Sensor
PDA
EXPERIENCE COLLECTION
Location
Transcript
Event
Topics
Signage
People
Organizations
Objects
Time
Sounds
Database
Searching
Filtering
Browsing
Exploring
Experience
PRIVACY TECHNOLOGY & POLICY
INFORMATION EXTRACTION
Panorama
Faces & Names
George Smith
Social and
Organizational
Networks
Susan White
Casualty report
Composite Map
Time-based Analysis
X
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
INFORMATION
Sensor Alert
X
Building
search
10
SYNTHESIS
TIME
Carnegie Mellon
Informedia Digital Human Memory
–
Extension of information extraction and search
technology to self-recorded video
– Adds position (GPS) as an information dimension
– Decreases data quality from broadcast TV to field
capture
– Transitions from structured content to unbounded
continuous media
– Utilizing mobile, wearable devices
for capturing images, video, audio and location
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
11
Carnegie Mellon
Feasible Goal? - How Much Data?
Estimated lifetime storage requirements
(Bell and Gray, Microsoft Research)
Data-types
read text, few pictures
Email, papers, written text
photos w/voice @100KB
photos @200 KB
spoken text @120wpm
spoken text @8Kbps
music or high quality sound
video-lite 50Kb/s POTS
video 200Kb/s VHS-lite
DVD video 4.3Mb/s
Rate
(Bytes/hour)
200 KB
200 K
Ten images/day
43 K
3.6M
60 M
22 M
90 M
1.8 G
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
12
Per day /
per 3 year
2 –10 MB / GB
0.5 MB / GB
2 MB / GB
2 MB / GB
0.5 MB / GB
40 MB / GB
60 MB / GB
0.25 GB/TB
1 GB/TB
20 GB/TB
Lifetime
amount
60-300 GB
15 GB
60 GB
150 GB
15 GB
1.2 TB
1.8 TB
7.5 TB
30 TB
600 TB
Carnegie Mellon
DHMM Auto-Generated
Timeline Biography
June 1972
Honeymoon
in St. Thomas
September 1980
Susan’s first
Planning for a
bigger house day of preschool
June 1996
Family reunion
in Pennsylvania
Paul graduates
from college
Paul’s college years
The early years of our marriage
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
May 1986
13
Carnegie
Mellon
Carnegie Mellon
Wearable Experience Capture System
Camera
Microphones
GPS
Battery
Notebook
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
14
Carnegie Mellon
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
15
Carnegie Mellon
Remembering Conversations
• A wearable, personalized Informedia system,
– listens to and transcribes the wearer’s part of a
conversation
– recognizes the face of the current dialog partner and
– remembers his/her voice
• Next time with the same person’s face/voice,
– replay the last conversation in compressed form the
names and major issues that were mentioned.
– All of this happens unobtrusively, a first step towards
an intelligent assistant
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
16
Carnegie Mellon
Remembering Conversations
• Record
– The audio and video of the conversation
• Index
– The face features of the conversation partner
– The voice features of the conversation partner
• Retrieve
– Query with the face and/or the voice
– Return the summary of last conversations vie headset
• Silence removal, emphasis detection
• Sort segments using TF.IDF scheme
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
17
Carnegie Mellon
Data Example
Training
Testing
Testing
Training
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
18
Carnegie Mellon
Results
Average
Rank
Method
Schneidermann + normalized eigenfaces
3.42
Visionics face recognition
3.33
Speaker identification by similarity
3.92
Speaker identification by pitch
6.22
Summing up every classifier
3.87
SVM meta-classifier
2.61
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
19
Carnegie Mellon
Problem Dimensions
• Personal Memory Collection
– Unobtrusive devices with sufficient bandwidth and storage capacity
• Personal Memory Analysis
– ‘Hostile’ environment speech recognition & image extraction
– Language processing for information filtering and segmentation
– What is important, possible to recognize & understand and how?
• (Collective) Summarization, Learning & Presentation
– Data mining over datasets with different qualities of metadata
– Generalization and formation of knowledge
– Interactive, dynamic information structured in real time
• Data Security and Rights
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
20
Carnegie Mellon
Missing Science
• Dealing with highly errorful data!
– Rules don’t scale well, statistics do better
– Learning to live with a 30% accuracy in underlying data
– Integrate analysis and coaching systems, not independent modules
• Robust Video Understanding
– Face and Object Detection and Recognition
– Scene Analysis
– Improved Speeds: Currently minutes per video frame
• Robust Audio Analysis
– Non-speech sounds
– Speaker ID and conversational speech recognition ‘at a distance’
• Interfaces
– Unobtrusive, usable while mobile, partial attention
– Sustainable over long periods21 of time
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
Carnegie Mellon
Outcomes
Perfect memory and digital immortality for an
individual
– A form of personal memory serving as a personal assistant
– Sharing memories and experiences
Expertise synthesized across individuals and maintained
over generations
– enables the creation of example-based learning environments
and expert archives
The establishment of a historical sense of truth
– re-establishing a previous event from multiple captured
personal histories
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
22
Carnegie Mellon
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
23
Carnegie Mellon
Collaborative Perspectives
View B
Panoramic View
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
24
Carnegie Mellon
Collaborative Perspectives
•
Collect data by multiple cameras in an
asynchronized fashion
•
Create collaborative views (panorama of a
virtual camera) using the data from multiple
cameras
•
Communicate the composite view back to the
participating viewers
•
Enable the user to change the position and
viewing angle of the virtual camera
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
25
Carnegie Mellon
Sequence 1: Mount Washington
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
26
Carnegie Mellon
Panorama of Sequence 1
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
27
Carnegie Mellon
Sequence 2: Mount Washington
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
28
Carnegie Mellon
Panorama of Sequence 2
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
29
Carnegie Mellon
Sequence 3: Mount Washington
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
30
Carnegie Mellon
Panorama of Sequence 3
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
31
Carnegie Mellon
Panorama 1 of Virtual Camera
(60 East from North)
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
32
Carnegie Mellon
Panorama 1 of Virtual Camera
(50 East from North)
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
33
Carnegie Mellon
Panorama 1 of Virtual Camera
(70 East from North, Zoom Out)
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
34
Carnegie Mellon
Mobile Multimedia Search and Retrieval
• Informational and educational multimedia resources
– Museums and libraries
– Schools
– Tourist information
– Advertising
– Personal video
• Indexed and searchable multimedia content
– Location, text, pattern-matching and spoken queries
• Delivered on-demand to mobile users
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
35
Carnegie Mellon
Searching Multimedia Content While in a Car
–
Use Informedia technology:
•
Index video into searchable video library
•
User-directed, manual geo-spatial searches
• Allow keyword searches for video content
•
–
Combine with on-demand video broadcasting
•
–
Include produced video AND self-recorded video
Alternatively use multimedia ‘fillup’
GPS location of car
–
Automatic geo-spatial search by location/route
– Search updated by time/distance traveled
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
36
Carnegie Mellon
Content Threshold by Location
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
37
Carnegie Mellon
Demonstration Project
with General Motors
High-speed 802.11a
(30 Mbps, 2.4 GHz)
Wireless
Access Point
Wireless
Access Point
Multimedia
Server
Gateway
Multimedia
Gateway
Web Server
Video Client
Internet
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
38
Carnegie Mellon
Virtual
On-the-Road Demonstration
… in Pittsburgh
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
39
Carnegie Mellon
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
40
Carnegie Mellon
Sources of Location-based Information
• Internet
– Usually only in text form, no video or audio
– Need Text-to-Speech (TTS), commercially available
– Frequent crawling of time-sensitive sites
• Local television
• Documentaries covering the region
• In the future:
Special production of location-relevant multimedia
– for advertising
– other information, e.g. traffic cameras
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
43
Carnegie Mellon
Location Information - Pittsburgh
• 20 web sites with calendars, events
– Convert using text-to-speech technology
• 5 NPR Documentaries about Pittsburgh
with many interesting details
• Privately recorded video with geographically
coded location from GPS
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
44
Carnegie Mellon
Current Work (with General Motors)
• Merge location information with
news, music, audio books
– Only one audio/video channel for all information
• User preferences to select categories of
information
– Learn preferences by observation
• Emphasize audio to avoid distracting driver
• Require very simple interface
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
45
Carnegie Mellon
Open Questions
• A business model for mobile multimedia?
– Who will pay? Advertisers, drivers, car makers?
• For which people is it most useful?
• Infrastructure development?
• Interface issues
– Head-up display on the windshield
– Buttons
– Speech or gesture interface
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
46
Carnegie Mellon
Conclusions
•
•
•
•
•
Mobile technology is advancing rapidly
Location is a central aspect of mobility
Multimedia data can be transmitted to mobile users
Collecting and searching multimedia while mobile
Prototype for search and retrieval of multimedia
information in a mobile environment
– Based on location
– Based on active queries
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
47
Carnegie Mellon
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
48
Carnegie Mellon