MPEG-4 & MHEG-5 (UK)
Download
Report
Transcript MPEG-4 & MHEG-5 (UK)
MPEG-4 & MHEG-5 (UK)
Aleksi Lindblad
Mika Linnanoja
Marko Luukkainen
Zhenbo Zhang
22.11.2005
MPEG4 & MHEG5
Basics, objects, BIFS (Zhenbo)
XMT (Aleksi)
Delivery (Mika)
MHEG5 (Marko)
MPEG-4 Overview
Definition: A family of open international standards that provide
tools for the delivery of multimedia
Tools
- codecs for compressing conventional audio and video
- form a framework for rich multimedia, i.e. combination of audio,
video, graphics and interactive features
Excellent Conventional Codecs
Highest quality and compression efficiency
Foundation of many new media products and services
Latest video codec: Advanced Video Codec (AVC1)
- compression rate half of MPEG-2 for similar perceived quality
- new standard for video transmission
- new HDTV, satellite broadcasting, DSL video services, Sony
PlayStation Portable, Apple QuickTime 7 Player will utilize AVC
Framework for Rich Interactive Media
Rich media tools:
- combining audio and video with text, still images, animations,
and 2D & 3D vector graphics into interactive and personalized
media experiences
MPEG-4 includes:
- scripting language for simple interaction
- MPEG-J for more elaborate programming
Why have manufactures and operators have
chosen MPEG-4
Excellent Performance
Open, Collaborative Development to Select the Best
Technologies
Competitive but Compatible Implementations
Lack of Strategic Control by a Supplier
Public, Known Development Roadmap
Encode Once, Play Anywhere
Flexible Integration with Transport Networks
Established Terms and Venues for Patent Licensing
Object Description
Object description: enumerates only the streams in a
presentation and specifies how they relate to media objects
Scene description: assemble those media objects into a specific
audiovisual scene
Object descriptor: a container aggregating all the useful
information about the corresponding object
Information is structed in a hierarchical manner
Through a set of sub descriptors
Synchronization of streams
Time: the most natural thing in the world
A lot of thought has to be dedicated in the context of multimedia
streaming
Time in MPEG-4 is always relative
Finding a simple temporal reference point
Example: play back from a local file or unicast streaming
The presentation is processed from its start
The start of the presentation makes a great reference point
In the case of broadcast or multicast playback
The cliend may not be aware of the start of presentation
The only known ponint: when the client tunes into the broadcast
This point is different for each cliend and unknown to the sender
The point when a portion of scene description data is received by the terminal is
taken as reference
Time stamps and access units
Two events in two different streams are supposed to happen at
the same time?
How to know – time stamps
Discrete portions of data related to a specific point in time exist
in all stream types
These potions of data – Access Units
Each ES is actually modeled as a sequence of Access Units
Size and contents of AUs depend on the media coder used
AUs are the data elements to which time stamps can be attached
Time stamps
Two different types
Decoding time: indicates the point in time at which all its data has
to be availabel in teh receiver and ideally be decoded at once
Composition time: indicates the time at which the decoded AU
becomes available for composition and subsequent presentation
BIFS
Acronym for BInary Format for Scenes
Provides a complete framework for the presentation engine of
MPEG-4 terminals
Enables to mix various MPEG-4 media together with 2D and 3D
graphics, handle interactivity
Be designed as an extention of the VRML 2.0(Virtual Reality
Modeling Language) specification in a binary form
Scene and Nodes
Scene is what the user of the MPEG-4 terminal sees and hears
Benificial to build the scence as a hierarchical structure or scene
tree
Visible or audible objects are leaf nodes
Multiple references to the same node are allowed
=>the scene is not really a tree but a directed a cyclic graph
Simplified scene tree
Fields and Routes
Fields
- attributes and interface of the nodes
A
A
A
A
value
type of the value
type of behavior
name
Routes
Events are usually generated by sensor nodes
Shall be connected to Event listener in oreder to modify the scene
This connection is called a route
MPEG4 & MHEG5
Basics, objects, BIFS (Zhenbo)
XMT (Aleksi)
Delivery (Mika)
MHEG5 (Marko)
XMT
Overview
XMT-Ω
How it works?
XMT-Ω and SMIL
XMT-A
How it works?
XMT-A and X3D
What is XMT?
Extensible MPEG-4 Textual Format
XML-based coding language for MPEG-4
systems
No explicit way to use the more elaborate videoor audio-tools defined in MPEG-4
Designed for human- or computer-generated
content creation and representation
What is XMT? (contd.)
Compatible with other XML-based multimedia
languages
SMIL
X3D
Can also contain javascript (MPEG-J)
Divided into two formats
High-level XMT-Ω
Low-level XMT-A
XMT-Ω
Easy to use and clear high-level language for
content creation
Divided into modules that realize certain
functionalities
For example animation and layout
Can also contain XMT-A nodes
No one-on-one mapping to MPEG-4 systems
or XMT-A
XMT-Ω and SMIL
XMT-Ω is based on SMIL
However some of SMIL’s modules are not
appropriate for MPEG-4 systems
Self-describing, extensible and familiar to content
producers
For example layout
These are redesigned for or added to XMT-Ω
”in the spirit of” SMIL
XMT-Ω functionality
Timing, synchronization and time
manipulation
Time containers <par> and <seq> play their
contents parallel or in sequence
Elements have time attributes such as duration,
beginning time and ending time
Timing can also be tied to an event
Time can be speeded up or slowed down
Events
Basic input events (mouse click, mouse over…)
More elaborate events such as object collisions
XMT-Ω functionality (contd.)
Animation
<set> element simply changes the values of the
fields
Different <animate> elements can be used for
sliding changes
Spatial layout
<transform> element can be used to place
elements
Layout module which works in a similar way as in
SMIL can also be used
XMT-Ω code example
…
<head>
<layout metrics="pixel" type="xmt/xmt-basic-layout">
<topLayout width="300" height="300" backgroundColor="white">
<region id="video_region">
<region id="watermark_region" translation="100 -90" size="91 27"/>
</region>
</topLayout>
</layout>
</head>
<body>
<par>
<video src="rainier_hike.mp4#video" region="video_region" begin="0s"
dur="indefinite"/>
<audio src="rainier_hike.mp4#audio" begin="0s" dur="indefinite"/>
<img src="emedia_icon91x27.jpg" id="sm_mark" region="watermark_region"
begin="0s" dur="indefinite" >
…
XMT-A
More powerful low-level language
A direct textual representation of MPEG-4
systems and BIFS
XMT-Ω code can be mapped to XMT-A in
several different ways
XMT-A and X3D
XMT-A is based on X3D
X3D is an XML representation of VRML on
which MPEG-4 systems is based on
Therefore XMT-A and X3D are highly similar
and interoperable with only small syntactic
differences
Object descriptor framework is unique to
MPEG-4 and XMT-A
Some XMT-A elements
Routes
BIFS-Commands
Bind the values of two fields together
Insert, Delete, Replace
Can be used on fields, nodes or routes
Object descriptors
Describe Elementary Streams that contain media
such as video or audio
XMT-A code example
…
<Transform2D DEF="Transformation">
<children>
<TouchSensor DEF=“Button"/>
<Shape>
<geometry>
<Rectangle size="50 40"/>
</geometry>
</Shape>
</children>
</Transform2D>
<Conditional DEF=“ButtonPressed">
<buffer>
<Replace atNode="Mover" atField="key" position="1" value="0.2"/>
</buffer>
</Conditional>
<PositionInterpolator2D DEF="Mover" key="0 0.5 1" keyValue="-100 0 100 0 -100 0"/>
<TimeSensor DEF="AnimationTimer" cycleInterval="2" loop="TRUE"/>
…
<ROUTE fromNode=“Button" fromField="isActive" toNode=“ButtonPressed" toField="activate"/>
<ROUTE fromNode="AnimationTimer" fromField="fraction_changed" toNode="Mover" toField="set_fraction"/>
<ROUTE fromNode="Mover" fromField="value_changed" toNode="Transformation" toField="translation"/>
…
Overview of XMT
XMT-Ω
SMIL
XML
X3D
XMT
XMT-A
BIFS
VRML
MPEG-4 systems
Node Types
Shape nodes
Geometry field – contains a geometry node,e.g. Rectangle,
Circle, Box, Bitmap
Appearance field – contains an Appearance node
Interpolator nodes
Conditional nodes
Further expands the possibilities of interaction
Script nodes, PROTO nodes, etc
Scene Changes
BIFS – Commonds
Sigle changes to the scence
Packaged in AUs of the scene description ES
BIFS-Commands are single changes to the scene, e.g. of color or
position
e.g. insert, delete, replace
BIFS – Anim streams
separate streams containing structured changes to a scene
Framework, three elements
Animation Mask
Animation Frames
AnimationStream
MPEG4 & MHEG5
Basics, objects, BIFS (Zhenbo)
XMT (Aleksi)
Delivery (Mika)
MHEG5 (Marko)
MPEG-4 Delivery & misc
Topics
MPEG-4 content delivery
MP4 file format
Interoperability: profiles & levels
Video coding (if time allows)
MPEG-4 Content Delivery
Delivery - Storing and Transporting of MPEG-4
compositions
MPEG-4 content must be delivered to many and very
different audiences Interworking with current delivery
mechanisms
Internet (MPEG-4 over IP)
Broadcasting (MPEG-4 over MPEG-2 Transport & Program
Stream)
Abstraction of content delivery in MPEG-4 part Delivery
Multimedia Integration Framework
MPEG-4 File Format based on Apple’s Quicktime design
Delivery Multimedia Integration
Framework, DMIF
OSI session layer service providing a mechanism for
hiding technology details from upper layer applications
DMIF concepts
Users (applications)
Sessions (presentation level)
Channels (stream level)
DMIF instance – implementation of delivery layer
Basically different MPEG-4 Elementary Streams (ES) are
multiplexed with timing information to the delivery
network
Stack ideology with multiple layers
Illustration
MPEG-4 delivery structure (User Plane)
Elementary streams
SL
SL
Synchronization
Layer
SL
FlexMux channel
SL-packetized streams
DMIF Application Interface
FlexMux tool
TransMux Channel
FlexMux streams
Delivery Layer
UDP
MPEG-2 TS
ATM etc
TransMux streams
DMIF functionality
In principle works like FTP
Application opens session
Decides which ES need to be transported (or saved)
Creates channels for the streams
Channels carry also instructions for interactivity (play,
pause, stop)
Quality of Service parameters can be assigned
to the delivery channels and monitored,
although advanced QoS handling is not included
in the standard
DMIF Application Interface
Defines functions offered by DMIF
DAI Primitives, only semantics defined
Service (create, destroy)
Channel (create, destroy)
QoS monitoring (setup, control)
User commands (user interaction)
Data (actual media content)
DMIF user calls these ”functions” to establish a
connection and convey media and interaction
DMIF Network Interface
Used for determining and sharing the needed information between
DAI peers over a transmission channel
Multiplexing of many DAI sessions to single TransMux
(ATM/UDP/MPEG-2) channel
Does not define “bits on the wire” itself
Concepts from other peer-to-peer protocols
Similar primitives as in DAI
Session
Service
TransMux
Channel
User commands
DMIF implementations
Mappings to real existing transport protocols
MPEG-4 over MPEG-2 (broadcasting and authoring)
ATM Q.2931 – no changes needed to atm protocol
ITU-T H.245 – additions in H.245 v.6
Real Time Signalling Protocol (RTSP), does not support all MPEG-4 functionality
Offering better quality via established transport means (MPEG-2 TS used in DVB and PS
used in DVD), ”alternative codec” thinking
Special amendment in MPEG-2 Systems standard
Transfer either scene-based or stream-based
MPEG-4 over IP
Uses Realtime Transport Protocol (RTP), which already encompasses timing information
MPEG-4 as payload in RTP, specified in RFC3016
Special care with packet alignment, so that dropped (single) RTP packets do not cause
problems
Mainly work-in-progress in 2001/2002
Commercial solutions available now
MPEG-4 File Format, mp4
Based on Apple Computer’s Quicktime
Not just stream ready to be delivered as with MPEG-1 and MPEG-2
Editing and reuse possible without quality reductions (lossy
decoding-recoding process eliminated)
Life-cycle file format, used in capturing, editing and combining
File includes stream data (video/audio) separately of metadata
describing it
Hints to help fragmenting the frames for streaming
Possibly many tracks of video and audio
Relative timing, frame sizes et cetera in structural tables
Nonframing format
Sample descriptors in tracks to identify required decoder
Handy tool to compose mp4 files: GPAC/mp4box
MPEG-4 Profiles 1/2
Ideas
Ensuring interoperability – allow manufacturers to only use
subset of available tools
Conformance to the standard testable
Profiles available for video, audio, graphics, scene
description, mpeg-java, object descriptor
Levels defined within each profile for further discrete
parameter limitations (bitrates etc)
Restrictions
Encoder: bitstream complexity not exceeded at defined
profile@level
Decoder: able to handle most complex bitstream at certain
profile@level
MPEG-4 Profiles 2/2
Object based approach
How many objects must be decoded simultaneously at a given time
greatly affects decoder’s required performance
Audio / Video profiles
List of allowed techniques and object types
Graphics profiles
Allowed BIFS nodes (’tags’ in XMT realization)
Advanced Simple (video) profile: I-VOP, P-VOP, B-VOP, GMC, QPEL, up
to 8 Mbit/s @ level 5
Simple 2D profile: Appearance, Bitmap, Shape
Development
New technologies introduced in new profiles, old ones unchanged
interoperability
Only new profiles/levels if they provide major changes
MPEG-4 Video Coding
Main goal to provide superb quality and innovative video
compression techniques that produce content requiring less storage
space
Old coding and compression techniques such as MPEG-2 only use
rectangular frame models
Handled in MPEG-4 Visual standard
Arbitrarily shaped objects
Wide range of bitrates (handhelds vs studio)
Spatial, temporal and quality scalability
Error-prone transmission abilities
Only decoder and bitstreams specified, encoders left to industry
Profiles and Levels defined to limit implementation difficulties, ”use
what you need” mentality
Both video and still images (textures)
Video shapes
MPEG-4 video scenes compose of Visual Objects (VO),
which are sequences of Video Object Planes (VOP), can
be thought of as frames
For each VOP an alpha plane is also defined, making
possible to have transparent parts of the video and
therefore arbitrary shapes to be coded
Each object has a bounding box that includes the object
Bounding boxes consist of macroblocks (16x16 pixels)
Macroblocks can be either transparent, opaque or border type
Opaque blocks coded with hybrid DCT/motion compensation
techniques like in MPEG-2
Rectangular video coding
Hybrid, block based compression schema
Basic principles
New inventions
Motion Compensation, only changes are saved to reduce storage or transmission
capacity
Discrete Cosine Transformation (DCT) to remove content that is indistinguishable
by humans
Quartel-pixel motion compensation, motion vector resolution increased to
decrease prediction errors
Global motion compensation, motion data for a complete VOP (frame) instead of
macroblocks only, also viewed as ”dynamic sprite coding”
Direct mode bidirectional prediction, motion vectors of neighbor blocks used
Innovations realized in the new Advanced Video Coding (AVC1) codec, also
known as H.264 (ITU-T term)
Open-sourced alternative encoder available at
http://developers.videolan.org/x264.html
MPEG-4 Video Coding Tools
Special tools intended for certain specific uses of video
Interlaced coding
Error-resilient coding
Goal is to reduce overhead in the introduction of redundant data
Packet-based periodic resynchronization, Data partitioning, NEWPRED
Reduced resolution coding
Sprite coding
For TV broadcasting needs, also HDTV formats like 1080i
Frame/field DCT, transforms on fields rather than frames for better quality
Field motion compensation using 16x8 top and bottom fields
Unchangeable parts in video content coded separately as static sprites (textures)
Texture coding for studio applications
Higher precision and lossless ability
Uncompressed PCM coding
MPEG4 & MHEG5
Basics, objects, BIFS (Zhenbo)
XMT (Aleksi)
Delivery (Mika)
MHEG5 (Marko)
4th Part
Digital Terrestrial Television
MHEG-5 Specification
Multimedia and Hypermedia information
coding Experts Group
MHEG-5 DTT UK
Object model of multimedia presentation
Audio, video, text and graphics
Broadcasting applications and their data into TV
networks.
Optional return channel
MHEG-5 Engine profile
Based on ISO/IEC 13522-5
Defines set of classes that profile must
implement
Some features modified, some added, some
optional/removed
Examples : Variable, Slider, Video
Features :
Caching, Cloning, Video scaling and Stacking of
Applications
The User Experience
Visual Appearances
Conventional TV
TV with Visual prompt of available information
TV with information overlaid
Information with video or picture inset
Just information
Visual appearances
Remote controller
MHEG-5 Graphics Model
720 x 576 pixels with 256 colors
Three levels of transparency required
632 x 518 safe area due to overscan
64 colors defined by DVB subtitle stream
4 colors defined by receiver manufacturer
188 color defined by MHEG-5 application
0% (opaque), 30% and 100% (fully transparent)
Bitmaps
Full PNG 1.0 support
MPEG I Frames
Text and Interactibles
Character encoding standards : ISO 10646-1
and UTF-8
Supported set of characters is defined
Triserias (DTG/RNIB) font must be supported
Current profile doesn't support font
downloading
Interactibles
EntryField
HyperText
Input of text and numbers
Links in text
Slider
Adjusting value
Application life-cycle
Only one application running at time
Application may launch other application
Auto-boot application
Original application is destroyed in the process
Launched when service is selected or when other
applications have quit
Applications are loaded from DSM-CC Object
carousel
MHEG-5 System Overview
Information server
Optional return channel
TV
MHEG Engine
Carousel generation
& transmission
Broadcast file system
Remote
MHEG-5 Summary
Offers lower cost interactive TV than MHP
Low hardware requirements
Coexistence and migration to MHP possible
Applications
Digital Teletext
Program guides
Interactive advertising
Educational
Games
References
F. Pereira and T. Ebrahimi. The MPEG-4 Book. Prentice Hall,
Upper Saddle River (NJ), 2002.
Digital TV Group (DTG). Digital terrestrial television MHEG-5
specification. v1.06, May 2003.
D. Cuttis, Strategy & Technology Ltd. Solutions for Interactive
Digital Broadcasting using MHEG-5. V1.0, September 2003.