Transcript MPEG-4


• • Objective • Standardize algorithms for audiovisual coding in multimedia applications allowing for • • • • Interactivity High compression Scalability of audio and video content Support for natural and synthetic audio and video The Idea • An audiovisual scene is a coded representation of audiovisual


related in space and time

MPEG-4: Scenario

• • A/V object • • • • A video object within a scene The background An instrument or voice Coded independently A/V scene • • • • Mixture of natural or synthetic objects Individual bitstreams multiplexed and transmitted One or more channels Each channel may have its own quality of service

MPEG-4: Video Object Plane

• • • • Video frame = sum of segmented regions with arbitrary shape (VOP) Shape motion and texture information of VOPs belonging to the same video object is encoded into a video object layer (VOL) Encode • VOL identifiers • Composition information Overlapping configuration of VOPs

MPEG-4: Coding

• • Shape coding • • • • Shape information in alpha planes Transparency of shape encoded Inter and intra shape coding functions After shape coding each VOP in a VO is partitioned into non-overlapping macroblocks Motion coding • • • Shift parameter wrt reference window Standard macroblock Contour macroblock

MPEG-4: Coding

• Texture coding • Intra-VOPs, residual errors from motion compensation are DCT coded like MPEG-1 • • 4 luminance and 2 chrominance blocks in a macroblock P-VOPs (prediction error blocks) may not conform to VOP boundary • • • Pixels outside the active area are set to a constant value Standard compression Efficient prediction of DC and AC components from intra and inter coded blocks • Multiplexing • • Shape  motion  texture coded data Motion and DCT coefficients can be jointly (H.263) or individually coded

MPEG-4 Video Object Segmentation-I

• Construct a video object • User selects start frame, outlines polygon designating rough object boundary • Refine boundary using snake algorithm, if needed • • Compute a k-pixel bounding box around the object Within bounding box compute • Edge map: bit plane, after thresholding a convolution kernel • Color map: compute luminance and chrominance, quantize by k means clustering, keep quantization table • Motion field: block-based motion vector • Segment into regions no significant edge, smooth color having smooth motion • Intersect segments and initial object boundary and determine foreground and background region • Estimate the motion of regions in the next frame with an affine motion model

MPEG-4 Video Object Segmentation-II

• Track object • Locate estimated position of foreground and background regions from previous frame. Call this the object mask.

• Generate same three feature maps with the quantization table; Requantize if error is large • • • Classify regions into foreground/background and


regions Intersection ratio r with object mask For foreground regions, if r > 80% OR foreground  mask, mark as foreground; label foreground - mask as new • For new regions, if r < 30% mark as new; if r > 80% mark as foreground; else find nearest-motion-similar neighbor. If it is in the foreground, do previous step, else keep region as new • Iterate until stable