Overview of current problems in multimedia signal processing

Download Report

Transcript Overview of current problems in multimedia signal processing

Overview of interesting problems
in
multimedia signal processing
Multimedia Signal Processing
• Deals mostly with information related to the
human sensory system
• We do not have general methods to solve
information processing problems which
human information processing system is
solving very well
• Multimedia signal processing is very
currently very important
Overview of current problems
• In this final lecture we present some current
problems which are very important for the
industry and industry is working on solving
them. Very often, the work in industry is secret
but in this case problems are so hard that industry
is encouraging work by publishing them and even
offering prizes for solution
DEMOLA PROBLEMS
• Demola is an activity in Tampere: Demola is an opportunity
for students to contribute real-life innovations with end-users
and globally connected organisations.
WWW.DEMOLA.FI
•
•
•
•
•
•
•
•
•
For students Demola provides
Collaboration with Finland’s top companies
Training and guidance from top professionals
Real world project experience
Credit points and an opportunity to do a thesis on learning
by doing
Multidisciplinary and international teamwork
IPR and business opportunities
Enriching interaction in Demola’s premises in Finlayson
A reward from the job well done
DEMOLA CURRENT PROGRAM
You will work for a real company as part of a project team.
You will develop something that you have always wanted and get
paid for the job and learn something totally new.
R&D specialists from the enabling companies will guide your
work.
You can get credit for Multimedia Project 5 cp
You can make your Master Thesis related to the project
Time period
The project constitutes full-time work for the project team between 24.5 - 31.8.2010.
Project areas & topics offered
• Sensor Assisted Image Tagging
• Wearable Computing
• Music Space
• Haptic Paintball
• Killer Mixed Reality Application
• One-eyed Wonder
We make an overview of these topics – we may think
about solutions
Project 1: Sensor Assisted Image Tagging
• Background and Motivation
Current camera applications on mobile phones, allow
adding textual tags to images after taking a picture. In
addition, users may be able to add geotags which
describe the location. This project will look beyond the
conventional textual tags. In the future, it is possible
that some information of your environment and
activity can be obtained automatically from sensory
data collected by the mobile phone.
Sensor Assisted Image Tagging
• What this project means?
We mentioned that cameras can use information
about time, place (GPS) when the picture was taken
In this project the goal is to use other types of
information like camera position, acceleration,
sound and content of the picture (e.g. place, type of
scene)
• Project Goal
This project will implement a camera application on Nokia
N900 which tags images with rich context information
obtained from a context recognition engine. The team's
goal is to implement a mobile camera application which
suggests tags based on rich context information collected
from mobile phone sensors. And optionally carry out user
trials on the application.
Development tools, environments and standards
The company partner Nokia Research Center will provide
a context recognition engine software package that detects
the user's environment (restaurant, street, nature) and
activity (idle, walking, in a train, in a car). The software is
implemented on the Nokia N900 using e.g. C language and
Qt. Development phones will also be provided.
Potential solution approach
• We were showing example of ambulatory video
but this is something else, only pictures are taken
In this project context recognition engine is
available, the question is what are its capabilities.
One could also look into some kind of training
system for identifying scenes. Literature review
can be made for this.
The system should be simple but work relatively
well. How to make it….think about it!
Project 2: Wearable Computing
• Background and Motivation
Wearable computing is one of the future trends,
where development in the areas such as smart
materials, miniaturization, sensor systems and
flexible displays will open new opportunities.
While these technologies are still emerging, we can
now look at creative ways to use the currently
available components. This project is an
exploratory hands-on project which aims to
demonstrate what we can do now with creativity
and existing, everyday technology.
• Project Goal
The goal of the project is to innovate creative
wearable computing items (e.g. integrated on tshirts, hats, sunglasses, shoes), and build them
with existing technical components. In addition to
utilizing electronics kits, the material can be
extracted from toys, legos, Christmas lights, game
boards, etc. – what ever you may find useful. The
resulting demonstrators can be useful but they can
be also be purely fun – let your creativity rule.
• Desired skills/competencies
The students should have creative minds, positive
can-do spirit, and earlier hands-on experience on
building things – whether it is hacking electronics
or playing with Mindstorm kits.
• Development tools, environments and standards
The students can decide what kits and components
to use within the given budget frame (estimated to
be a few hundreds euro).
Potential solution approach
• This may be something funny like here
•
e-fashion. The jewelry blinks when
•
somebody calls or message is send
•
via Bluetooth?
•
A company www.ifmachines.com
makes Electropuff – lamp
dimmer for kids
Potential solution approach
• It is impossible to give and idea – which is original
so one has to think about this, but important
consideration may be power supply and water
resistance. Creative thinking helps:
Swiss Army knife ???
Sunglasses ???
Something new is needed!
Gloves???
Makeup???
Project 3: Music Space
• Background and Motivation
New service offerings such as Comes with Music have
enabled users to store a huge selection music on their
mobile devices. Selecting and finding a desired item is
getting more difficult when the user needs to browse
through long lists of items.
The conventional access to music catalog using
indexed in hierarchic lists based on the name of the
artist, album, composer and genre should be replaced
by more intuitive search methods.
Music Space
• Project Goal
Design a media player user interface using touch
screen, available sensors, 3D graphics and 3D
audio rendering. Let the user navigate in audio
visual 3D space filled with music
Development tools, environments and standards
QT environment Gstreamer media
Potential solution approach
• It looks that instead of the list, the use would use
3D graphics model with navigation by touch. The
problem is how to map music pieces to the model.
One could imagine e.g. a street with different
clubs ’Jazz’, ’Pop’, ’ Techno’…. with user
navigating them.
The model can not be too complicated, very clear,
it should associate graphics with music style
Project 4: Haptic Paintball
• Background and Motivation
Location based applications are one of the rapid growing areas for
mobile devices. Typically the location sensing is based on GPS satellites
for outdoors and WiFi networks for indoors. At the same time mixed
reality applications are gaining speed as well. Augmentation of reality
is typically based on sophisticated video capturing and rendering
technologies, however new eyes-free haptic and/or audio methods are
entering the field currently.
NRC Helsinki has demonstrated their indoor positioning system, which
could be combined with NRC Tampere's mixed reality expertise to
create multi-player “paintball” game. Alternatively GPS could be used
in outdoors settings. NRC Tampere has experience in using haptic
pointing devices which could use orientation and simple gestures for
pointing at each other and shooting. Tactile feedback representing
capturing the target could be delivered by the same hand-held
pointers. Additional audio feedback could be used for enhancing the
experience.
Project Goal
• Explore the possibilities for such mixed reality multiplayer game in both indoors and outdoors
environments and implement (at least) one of them.
Creativity in interaction design using the specified
multiple modalities and finding suitable metaphors
are essential part of the main outcome.
• Development tools, environments and standards
Existing technology descriptions (keywords:
augmented reality, haptics, tactile feedback, wearable
computing)
Potential Solution Method
• It is not very clear what is the role of positioning
in this type of hand paintball
Maybe there would be some devices (robots?)
shooting when they know positions of players?
Or part of the game is played on computers?
Or there is ”intelligence” where other players are?
Seems the goal is to invent attractive scenario
Project 5: Killer Mixed Reality Application
• Background and Motivation
Mobile Mixed Reality promises to “fuse” the real world with digital
information, creating the real-world-web. This is possible as mobile
phones can sense the real world environment (via camera and other
sensors), and on top of that overlay information (Augmented Reality)
that has been downloaded from the Internet.
Project Goal
• We would like you to explore, concept, design and prototype the next
killer mixed reality application building on top of our many existing
components (mobile mixed reality browsers, mash-up APIs, unique
geo-content). If you had all the needed technology components, and
could mash-up content from anywhere, what would you make to
change the everyday life of millions of people?
Killer Mixed Reality Application
• Development tools, environments and standards
We will provide access to the Mixed Reality
Solutions Web service platform that allows you to
easily build mixed reality services and also access
to unique Navteq geo-content like maps, streetview panoramas, POIs and 3D building models
through a ReSTful API.
• Relevant technologies include HTTP, ReSTful
Web services, XML/JSON, MySQL, Qt, etc.
Potential Solution Method
• The project aims for extending current
multiplayer games and virtual environments like
Second Life by data from real world: maps, GPS
location etc.
• How to do this is big question – Maybe building
world model and having own information put on
it?
For example, getting contacts based on
geolocation?
Project 6:One-eyed Wonder
• Background and Motivation
Imagine controlling your home media and devices by
pointing your smart phone (camera) at meaningful
objects like the TV, or turning your coffee mug into a
magic wand control. And then performing complex
combinations of content acrobatics with ease: playing
out photos, music and video around your lounge and
home. This would change the way people interact with
their home electronics and digital content. And this is
what we want you to do by combining networked
smart devices and computer vision.
One-eyed Wonder
• Project Goal
We propose three progressively more involved challenges:
• Create a simple N900 Maemo app to understand what device (or
object) a user points their phone camera at (the user focus), and
initiate basic multimedia apps – like playing music from a phone to a
networked stereo.
• Create an "eye in the sky" app (on Maemo or Ubuntu connected to a
webcam "embedded in the environment) to recognise when a user is
pointing at an object in a room by: recognising pointing and target
objects, recognising pointer activation (a user taking it in hand) and
recognise user focus (the line/cone/plane the user is pointing at, to see
what target objects are in "the line of focus"). Add this to the
multimedia app so that it no longer needs an active networked
electronic device in hand to work.
• Innovate a way to also identify which user is performing which
pointing (e.g. for multiple people in the same room, and for
personalised effects to the multimedia app).
One-eyed Wonder
• Development tools, environments and standards
Probably: Python, OpenCV, Maemo5/N900, Ubuntu
9.10, pen and paper.
Possibly: Qt Designer, USBwebcams, Gstreamer and
web services. Possibly Eclipse)
Also: support from the NRC Tampere 3D Platform
team with distributed/cloud and proximity/smart
space access to user content and networking aspects.
And: anything that comes with a good reason why it
should be used.
Potential solution approach
• We have seen in this course devices like Nintendo
Wii
Wii by Nintendo
Contollers have
motion sensors
In this case the goal is to use mobile device instead of controller
and mobile device has the same sensor. But the device should
also select specific action for example related to TV so the system
should know if the action is related to TV. This can be discovered
by a camera installed on the ceiling but that is not easy.
Maybe some kind of positioning system can be used?
International Projects
• These are available at
www.Multimediagrandchallenge.com
They are offered by companies looking for solutions
maybe more to researchers than to students
Nokia Challenge
Where was this Photo Taken, and How?
• The problem can be stated simply: try to derive exact
camera poses (location and orientation) of given
photos that are lacking location annotation. This kind
of technology could potentially be used to add
metadata to existing or newly captured photos.
• Assumptions: You can assume the availability of
nearby photos/video with known location that can be
used to derive unknown camera poses; other ideas
that do not require existing content will be welcome.
While a “clean” solution is ideal, other models that
help could be used, for example, exploiting inertia
sensor data, properties of personal collections, or the
presence of textual
Google Challenge
Robust, As-Accurate-As-Human Genre
Classification for Video
• Having videos classified into a pre-existing
hierarchy of genres is one way to make the
browsing task easier. The goal of this task would
be to take user generated videos (along with their
sparse and noisy metadata) and automatically
classify them into genres
Google Challenge
Indexing and Fast Interactive Searching in
Personal Diaries
• Diaries can be any combination of audio, video,
geographic location, photos, phone logs, and whatever
other multimedia data the user generates or accesses.
To make the data accessible, it needs to be parsed into
indexable, browsable, and searchable structures such
as places, environments, episodes, actions, and events
of various sorts, and clustered and tagged with
categories, identities, and tags of whatever sort the
user proposes. The challenge is to develop good
schema, algorithms, UI, etc., that will be useful for
diaries from audio-only through full-featured
multimedia.
3DLife Challenge
Sports Activity Analysis in Camera Networks
• This challenge focuses on exploring the limits of what
is possible in terms of 2D and 3D data extraction from
a low-cost camera network for sports. Tennis is chosen
as a case study as it is a sporting environment that is
relatively easy to instrument with cheap cameras and
features a small number of actors (players) who
exhibit explosive and rapid sophisticated motion.
• The goal is to facilitate coaches and mentors to
provide better feedback to athletes based on recorded
competitive training matches, training drills or any
prescribed set of activities.
Radvision Challenge
Video Conferencing To Surpass “In-Person”
Meeting Experience
• This challenge focuses on developing new technologies
and ideas to surpass the “in-person” meeting
experience. In the process a set of subjective and
objective measures to evaluate “meeting” experience
will be developed. With these measures, alternative
solutions could be compared to each other and to inperson meetings, and optimized accordingly.
• It is assumed that when meeting experience will be
good enough, or even better, the technology could
potentially minimize the need for “physical” meetings
(at least for business purposes).
Conclusions
• Multimedia signal processing is very important
in many practical applications
Problems in this area are difficult to solve, many
of them are already done by biological systems
We may expect progress in the future since
processing power of computers grows very quickly
Solutions to problems have to be currently looked
by inventing clever algorithms matched to specific
tasks