Transcript MPEG
MPEG
MPEG-Video This deals with the compression of video signals to about 1.5 Mbits/s; MPEG-Audio This deals with the compression of digital audio signals at a rates of 64, 128 or 192 kbits/s per channel; MPEG-System This deals with s
y
nchronisation and multiple
x
ing of multiple compressed audio and video bit streams.
MPEG Versions
• MPEG 1 for CD ROM. • MPEG 2 for broadcast quality.
• Other MPEG versions (eg MPEG7) are not compression systems.
• We will be talking about MPEG 2
Data and Compression rates
• MPEG1 video has data rates up to 1.5 Mbits/s.
• MPEG2 video has data rates between 3-15 Mbits/s for broadcast and 15-30 Mbits/s for high definition.
• The commercial uncompressed digital video data stream (SDI) has a data rate of 270 Mbits/s, although this includes audio and facility for 10 bit video coding.
Data and Compression rates
• However even if we consider transmitting monochrome television pictures of size (576 x 720) 25 frames per second at 8 bits resolution. We have a data rate of 576 x 720 x 25 x 8 = 82.944 Mbits/s.
• We have to double this (at least) for colour giving nearly 166 Mbits/second.
• Therefore MPEG can give 100:1 compression or more.
Spatial and temporal redundancy
• MPEG makes use of temporal and spatial redundancy.
• Temporal redundancy means that we are unnecessarily transmitting the same information (data) over time.
– Eg Backgrounds do not need to be sent every frame.
Spatial and temporal redundancy
• Spatial redundancy .means we are unnecessarily transmitting detail information (spatial information) which cannot be perceived by the eye.
• This is what JPEG does on still images. • By avoiding to carry this unnecessary (redundant) information we can achieve compression.
• Note that while the spatial compression is lossy, the temporal compression is not.
Comparison with JPEG
• MPEG the same spatial compression method as JPEG.
• The temporal compression uses other techniques.
Overview of MPEG
• MPEG takes incoming frames and produces a spatially compressed image.
• MPEG also predicts motion in the scene and estimates where blocks of pixels have moved to in another frame.
• MPEG can then transmit vector (or motion) information only to predict the next frame.
• However since the prediction can be inaccurate MPEG also transmits an error picture (spatially compressed) with the vector predictions.
Prediction and macroblocks
• MPEG divides each frame into blocks of size 16 x 16 pixels called macroblocks.
• The idea is to find which block in the predicted frame have the pixels in the reference frame moved to.
• This can be done by comparing each macroblock in the reference frame with possible position in the predicted frame and finding the closest match.
• We then send a prediction “vector” which describes the movement of each block.
Difference pictures
• Unfortunately, 16 x16 blocks are quite large so it is unlikely that all the pixels in one block will have moved to another.
• There will generally, therefore, be errors in the prediction made by moving blocks around.
• We know the error at the sending end, because it is simply the difference between the actual picture and the predicted picture.
• So if we send the error as well as the prediction, we can reconstruct the actual picture.
I-Frames
• I (Intrapicture) – I-frames do not any motion prediction. They use spatial compression only. That is, the complete frame is transmitted in a JPEG like form. They are needed for several reasons including: – To start an MPEG sequence off (since there is nothing to predict one the first frame) – So that an MPEG stream may be joined at a point other than the start.
– To recover from errors and degradation caused by repeated reference to previous frames.
• Sometimes called keyframes.
I-Frames
• I (Intrapicture) – I-frames do not any motion prediction. They use spatial compression only. That is, the complete frame is transmitted in a JPEG like form. They are needed for several reasons including: – To start an MPEG sequence off (since there is nothing to predict one the first frame) – So that an MPEG stream may be joined at a point other than the start.
– To recover from errors and degradation caused by repeated reference to previous frames.
• Sometimes called keyframes.
P-Frames
• P (Predicted Picture) – P-frames send only motion prediction information and a spatially compressed error picture. • The actual frame is constructed from a previous frame, with the pixels in the “macroblocks” moved to their new location. • Since this may be far from perfect the compressed error picture is added to compensate.
• The previous frame could be an I frame or another P-frame.
• In the situation where nothing moves in the scene then the P-frame information is zero and the actual constructed frame is the same as the previous one. (maximum compression).
B-Frames
• Imagine the situation where an object moves to reveal a (stationary) background.
• Since this background may be fully revealed in a later frame. We could use this future frame as a reference and backwardly predict previous frames.
• Also, if we now the positions of blocks in future and previous frames we can predict intermediate frames. • B (Bi-directional prediction) – Allows interpolation and prediction from both previous and future (I and P) frames.
• B-frames allow the most compression.
B-Frames
• There are clearly associated problems with bi-directional frames.
• We have to wait for future incoming video before they can be coded. This causes delay.
• We have to transmit future frames before intermediate B-frames so that the decoder has the future and previous references available to construct the actual frame from the B-frames.
Groups of pictures (GOP)
• The MPEG sequence therefore consists of a combination of I-, P and B-frames.
• This sequence is called a group of pictures (GOP) • Usually the group repeats (but it does not have to); for example a typical group of 12 frames.
– B 1 B 2 I 3 B 4 B 5 P 6 B 7 B 8 P 9 B 10 B 11 P 12 – The subscripts indicate the original video frame order.
Groups of pictures (GOP) (order of sending)
• However, as indicated above the order is different in the actual bit stream because frames cannot be predicted without the appropriate reference.
• The corresponding sending order (bitstream) would therefore be: – I 3 B 1 B 2 P 6 B 4 B 5 P 9 B 7 B 8 P 12 B 10 B 11
Exercise
• A video sequence is coded using the following GOP: – B 3 B 4 P 1 P 2 I 5 • Suggest a suitable corresponding bitstream sequence.
Quality of service and variable quantisation.
• The amount of redundancy (both spatial and temporal) in moving video pictures varies, depending on the programme content.
• Sometimes almost zero data is transmitted. For example a still frame. While in action sequences the amount of data produced is large.
• It is desirable to produce a constant data rate.
Quality of service and variable quantisation.
• The data is therefore buffered (stored) and often transmitted at a constant rate.
• This allows the system to nearly fill the buffer when the data produced is large, but operate with an empty buffer when little data is produced.
• Sometimes, when there is a lot of change between one frame and the next, the buffer would overflow if some action where not taken to prevent this from happening.
.
Quality of service and variable quantisation.
• The system therefore produces larger quantisation steps to the DCT co-efficients (rejecting more high frequency components) when this happens to prevent system failure.
• Sometimes only the dc component remains.
• This results in poorer quality pictures (blocking and smearing) at times of low spatial and temporal redundancy.
Quality of service and variable quantisation.
• This can be seen on most digital television systems.
• Therefore the quaility of service depends on the (previously agreed) output data rate.
Further reading.
• www.mpeg.org
• Art of Digital Video, Watkinson, Focal press.
• www.snellwilcox.com/referen ce/pdfs/ecomp.pdf