Multimedia Information for Personal and Mobile Environments © Copyright 2002 Michael G.
Download ReportTranscript Multimedia Information for Personal and Mobile Environments © Copyright 2002 Michael G.
Multimedia Information for Personal and Mobile Environments © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 1 Carnegie Mellon Outline • • • • • • Technological Background Informedia Digital Video Library Digital Human Memory Information in Mobile Environments Prototype Demonstration Conclusions © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 2 Carnegie Mellon Technological Impetus • • • • • • • Standardized digital formats Increased bandwidth and interconnectivity Faster and cheaper computers Efficient data compression Strong digital security Ubiquitous mobility of data Inexpensive data storage © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 3 Carnegie Mellon Media and Information Convergence Television Radio Print Telephony Internet Shared Media © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 4 Carnegie Mellon Informedia System Overview Library Exploration Library Creation Online Offline Video Audio Text Digital Compression Speech Recognition Spoken Natural Language Query Image Natural Language Extraction Interpretation Story Choices Segmentation Indexed Database Segmented Indexed Transcript Compressed Audio/Video SemanticExpansion DISTRIBUTION TO USERS © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 7 Requested Segment Indexed Database Segmented Indexed Transcript Compressed Audio/Video Carnegie Mellon Digital Human Memory – Technology for creating a continuously recorded, digital, high fidelity record of one’s whole life in video form – Personal, wearable units which record audio, video, GPS and electronic communications; capturing all that is heard, seen & experienced – Transforming this personal history into a meaningful, accessible information resource with auto-search and auto-summarization © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 8 Carnegie Mellon Overview of Human Experience Capture GPS Video Biometrics Sensor PDA EXPERIENCE COLLECTION Location Transcript Event Topics Signage People Organizations Objects Time Sounds Database Searching Filtering Browsing Exploring Experience PRIVACY TECHNOLOGY & POLICY INFORMATION EXTRACTION Panorama Faces & Names George Smith Social and Organizational Networks Susan White Casualty report Composite Map Time-based Analysis X © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann INFORMATION Sensor Alert X Building search 10 SYNTHESIS TIME Carnegie Mellon Informedia Digital Human Memory – Extension of information extraction and search technology to self-recorded video – Adds position (GPS) as an information dimension – Decreases data quality from broadcast TV to field capture – Transitions from structured content to unbounded continuous media – Utilizing mobile, wearable devices for capturing images, video, audio and location © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 11 Carnegie Mellon Feasible Goal? - How Much Data? Estimated lifetime storage requirements (Bell and Gray, Microsoft Research) Data-types read text, few pictures Email, papers, written text photos w/voice @100KB photos @200 KB spoken text @120wpm spoken text @8Kbps music or high quality sound video-lite 50Kb/s POTS video 200Kb/s VHS-lite DVD video 4.3Mb/s Rate (Bytes/hour) 200 KB 200 K Ten images/day 43 K 3.6M 60 M 22 M 90 M 1.8 G © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 12 Per day / per 3 year 2 –10 MB / GB 0.5 MB / GB 2 MB / GB 2 MB / GB 0.5 MB / GB 40 MB / GB 60 MB / GB 0.25 GB/TB 1 GB/TB 20 GB/TB Lifetime amount 60-300 GB 15 GB 60 GB 150 GB 15 GB 1.2 TB 1.8 TB 7.5 TB 30 TB 600 TB Carnegie Mellon DHMM Auto-Generated Timeline Biography June 1972 Honeymoon in St. Thomas September 1980 Susan’s first Planning for a bigger house day of preschool June 1996 Family reunion in Pennsylvania Paul graduates from college Paul’s college years The early years of our marriage © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann May 1986 13 Carnegie Mellon Carnegie Mellon Wearable Experience Capture System Camera Microphones GPS Battery Notebook © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 14 Carnegie Mellon © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 15 Carnegie Mellon Remembering Conversations • A wearable, personalized Informedia system, – listens to and transcribes the wearer’s part of a conversation – recognizes the face of the current dialog partner and – remembers his/her voice • Next time with the same person’s face/voice, – replay the last conversation in compressed form the names and major issues that were mentioned. – All of this happens unobtrusively, a first step towards an intelligent assistant © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 16 Carnegie Mellon Remembering Conversations • Record – The audio and video of the conversation • Index – The face features of the conversation partner – The voice features of the conversation partner • Retrieve – Query with the face and/or the voice – Return the summary of last conversations vie headset • Silence removal, emphasis detection • Sort segments using TF.IDF scheme © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 17 Carnegie Mellon Data Example Training Testing Testing Training © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 18 Carnegie Mellon Results Average Rank Method Schneidermann + normalized eigenfaces 3.42 Visionics face recognition 3.33 Speaker identification by similarity 3.92 Speaker identification by pitch 6.22 Summing up every classifier 3.87 SVM meta-classifier 2.61 © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 19 Carnegie Mellon Problem Dimensions • Personal Memory Collection – Unobtrusive devices with sufficient bandwidth and storage capacity • Personal Memory Analysis – ‘Hostile’ environment speech recognition & image extraction – Language processing for information filtering and segmentation – What is important, possible to recognize & understand and how? • (Collective) Summarization, Learning & Presentation – Data mining over datasets with different qualities of metadata – Generalization and formation of knowledge – Interactive, dynamic information structured in real time • Data Security and Rights © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 20 Carnegie Mellon Missing Science • Dealing with highly errorful data! – Rules don’t scale well, statistics do better – Learning to live with a 30% accuracy in underlying data – Integrate analysis and coaching systems, not independent modules • Robust Video Understanding – Face and Object Detection and Recognition – Scene Analysis – Improved Speeds: Currently minutes per video frame • Robust Audio Analysis – Non-speech sounds – Speaker ID and conversational speech recognition ‘at a distance’ • Interfaces – Unobtrusive, usable while mobile, partial attention – Sustainable over long periods21 of time © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann Carnegie Mellon Outcomes Perfect memory and digital immortality for an individual – A form of personal memory serving as a personal assistant – Sharing memories and experiences Expertise synthesized across individuals and maintained over generations – enables the creation of example-based learning environments and expert archives The establishment of a historical sense of truth – re-establishing a previous event from multiple captured personal histories © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 22 Carnegie Mellon © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 23 Carnegie Mellon Collaborative Perspectives View B Panoramic View © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 24 Carnegie Mellon Collaborative Perspectives • Collect data by multiple cameras in an asynchronized fashion • Create collaborative views (panorama of a virtual camera) using the data from multiple cameras • Communicate the composite view back to the participating viewers • Enable the user to change the position and viewing angle of the virtual camera © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 25 Carnegie Mellon Sequence 1: Mount Washington © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 26 Carnegie Mellon Panorama of Sequence 1 © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 27 Carnegie Mellon Sequence 2: Mount Washington © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 28 Carnegie Mellon Panorama of Sequence 2 © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 29 Carnegie Mellon Sequence 3: Mount Washington © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 30 Carnegie Mellon Panorama of Sequence 3 © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 31 Carnegie Mellon Panorama 1 of Virtual Camera (60 East from North) © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 32 Carnegie Mellon Panorama 1 of Virtual Camera (50 East from North) © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 33 Carnegie Mellon Panorama 1 of Virtual Camera (70 East from North, Zoom Out) © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 34 Carnegie Mellon Mobile Multimedia Search and Retrieval • Informational and educational multimedia resources – Museums and libraries – Schools – Tourist information – Advertising – Personal video • Indexed and searchable multimedia content – Location, text, pattern-matching and spoken queries • Delivered on-demand to mobile users © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 35 Carnegie Mellon Searching Multimedia Content While in a Car – Use Informedia technology: • Index video into searchable video library • User-directed, manual geo-spatial searches • Allow keyword searches for video content • – Combine with on-demand video broadcasting • – Include produced video AND self-recorded video Alternatively use multimedia ‘fillup’ GPS location of car – Automatic geo-spatial search by location/route – Search updated by time/distance traveled © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 36 Carnegie Mellon Content Threshold by Location © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 37 Carnegie Mellon Demonstration Project with General Motors High-speed 802.11a (30 Mbps, 2.4 GHz) Wireless Access Point Wireless Access Point Multimedia Server Gateway Multimedia Gateway Web Server Video Client Internet © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 38 Carnegie Mellon Virtual On-the-Road Demonstration … in Pittsburgh © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 39 Carnegie Mellon © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 40 Carnegie Mellon Sources of Location-based Information • Internet – Usually only in text form, no video or audio – Need Text-to-Speech (TTS), commercially available – Frequent crawling of time-sensitive sites • Local television • Documentaries covering the region • In the future: Special production of location-relevant multimedia – for advertising – other information, e.g. traffic cameras © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 43 Carnegie Mellon Location Information - Pittsburgh • 20 web sites with calendars, events – Convert using text-to-speech technology • 5 NPR Documentaries about Pittsburgh with many interesting details • Privately recorded video with geographically coded location from GPS © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 44 Carnegie Mellon Current Work (with General Motors) • Merge location information with news, music, audio books – Only one audio/video channel for all information • User preferences to select categories of information – Learn preferences by observation • Emphasize audio to avoid distracting driver • Require very simple interface © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 45 Carnegie Mellon Open Questions • A business model for mobile multimedia? – Who will pay? Advertisers, drivers, car makers? • For which people is it most useful? • Infrastructure development? • Interface issues – Head-up display on the windshield – Buttons – Speech or gesture interface © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 46 Carnegie Mellon Conclusions • • • • • Mobile technology is advancing rapidly Location is a central aspect of mobility Multimedia data can be transmitted to mobile users Collecting and searching multimedia while mobile Prototype for search and retrieval of multimedia information in a mobile environment – Based on location – Based on active queries © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 47 Carnegie Mellon © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 48 Carnegie Mellon