Transcript Document

Media Data and File Formats
Howell Istance
© De Montfort University, 2001
1
Text Files
• 1. Plain text (unformatted)
– ASCII Character set is most common, 7 bits are used
– This can represent 128 Code words
• A = 1000001
• A = 1100001
• Computers store data in bytes
• The extra bit can be used for Parity / Extended Character Sets :
– Error detection
• A parity bit is used
• 10000011 (Odd Parity)
– Extend codewords to 256
• IBM’s EBCDIC
• 2. Formatted Text
• Characters used to give text and formatting information
• Bold, Italic, Position, etc
• Also contains information on page numbers, version, index, etc
– Formatted files are usually much larger than their plain text equivalent
© De Montfort University, 2001
2
Vector Graphics and Bitmapped
Images
• Vector Graphics: image represented and stored as a
collection of shapes, together with data (parameters)
defining how the shapes will be produced and where they
will be located
• Bitmapped Images: image represented and stored as a
collection of pixels which displayed make up the image
© De Montfort University, 2001
3
Bitmapped images
Pixels make up an image
© De Montfort University, 2001
Colour - Depth
#FF00A3
#255,0,163
#11111111,00000000,01100011
Each pixel has a colour depth. A
certain number of ‘bits’ are used to
define a pixels colour e.g. held as an
RGB value.
For 256 (or less) colours a palette is
used as an index describing which
256 actual colours to use in an image.
240,200,171
© De Montfort University, 2001
Models for bitmapped images
• Model consists a 2-D array of pixel values
• May be of a different size and colour depth from the image
which will be finally displayed.
• A view of the image displayed on screen in an image editor
is not the model, the view has been transformed and
clipped and displayed
• You do not see the model
© De Montfort University, 2001
6
Models in Vector graphics
y
(0,0)
x
• Model is a series of mathematical constructs, together with data to
define the location, size and attributes of each, such as colour, line
style…
• Constructs include
– shapes (rectangle, oval, lines),
– curves
– polygons (sets of points, coordinate pairs, with lines joining consecutive
points), polylines
– polygon meshes (set points with instructions to show which points are to
be joined by lines)
© De Montfort University, 2001
7
Storing models….as bitmapped images
4.5 cm
If the rectangles measure 4.5 cm, then on a
72 dpi (dots per inch) monitor, each side will
contain 128 pixels
• The image will contain 128 * 128 = 16384 pixels
• If 3 bytes are used to store each pixel value, then
16384 * 3 bytes = 49152 bytes are required
• Size is constant regardless of complexity of image within
the 128 pixel square
© De Montfort University, 2001
8
Storing models….as vector graphics
4.5 cm
(Post Script)
0 1 0 setrgbcolour
0 0 128 128 rectfill
1 0 1 setrgbcolour
32 32 64 64 rectfill
• 78 bytes required!
• But Postscript renderer required, which slows the display
process and has to be available on the host machine
• Size increases as complexity of image increases, as more
instructions are needed to define the image
© De Montfort University, 2001
9
Representation as vector graphics…
• Vector graphics enable images to be
composed of filled shapes
• Each object can be manipulated
individually
• Scaling objects is easy (by applying
mathematical transforms to the
object definition)
© De Montfort University, 2001
10
Distorted poppy…
Easy to manipulate individual elements of image here…
© De Montfort University, 2001
11
Vector representation of complex
images
• To approach realistic
image, complex
definition of gradient
meshes is required
• File size approx. 10
MB
• Generated in
Illustrator
• Taken from Wiley
book site
© De Montfort University, 2001
12
Rendered as a bitmapped image
• File size of this image is
152K
• No longer possible to interact
with separate components
• Edits and application of
effects are done on the vector
version and the end result is
saved as a bitmapped image.
© De Montfort University, 2001
13
GIF Files
• Image files hold a lot of data
– Image files tend to be large files
• To reduce storage space COMPRESSION
techniques are used
• One solution is RUN LENGTH ENCODING
– Count the number of pixels that are the same
– Decoder uses this count to copy the original pixel
X times
© De Montfort University, 2001
14
GIF Files
– Developed by Compuserve
– Used for single or multiple images
– Based on LZW compression
• Lempel, Ziv invented original algorithm
• Welch developed it further
– Replaces multiple strings of data with a TOKEN……..
And a count value
– LZW can give reasonable compression  50%
© De Montfort University, 2001
15
Compression - LZW / RLE
Runs of colour can be defined in a
simple Run / Colour / Number format.
e.g. R0206 for black, R0304 for gray,
RF005 for green, R04FE for red,
R0203 for black, R0614 for blue taken from the palette below.
1
2
3
4
5
6
The palette
© De Montfort University, 2001
GIF Files
•
•
•
•
Decompression is fairly quick
Universal standard
Not optimised for image compression
UNISYS hold patent on LZW so there may be a problem
with royalties
© De Montfort University, 2001
17
Compression - LZW / RLE
TIF uncompressed = 289k
TIF lzw compressed = 248k
TIF uncompressed = 90k
TIF lzw compressed = 5k
© De Montfort University, 2001
JPEG Files
• Joint Photographic Experts Group
• Uses a Fourier Transform technique to
eliminate high frequency components in
image
• Uses several algorithms including run-length
encoding
• Can be lossy
– blockiness
– posterisation
– Ringing
© De Montfort University, 2001
19
Digital Video
• We see a sequence of still images as a continuous
movement if rate of presentation is greater than critical
flicker fusion frequency (about 40 images/sec)
• Film is shot at 24 frames/second but each frame effectively
shown twice in projection giving refresh rate of 48
frames/second
• 3 broadcast TV and video standards
– NTSC – US, Canada
– PAL – Europe, Australia
– SECAM – France, Eastern Europe
© De Montfort University, 2001
20
Interlacing and Frame rates
• For each system, refresh rate is double broadcast frame
rate, by showing half of one frame followed by the other
half
• NTSC (30 frames/second), PAL and SECAM (25
frames/second)
© De Montfort University, 2001
21
Image sizes and Data Rates…
• Consider the amount of data to represent a sequence of
digital images at NTSC broadcast rate, if 3 bytes used to
represent a pixel value
– 640 * 480 * 3 = 900K per image
– 1 second at 30 frames/second = 26 Mb
– 1 minute at 30 frames/second = 1.6 Gb
• For PAL/SECAM, similar
– 768 * 576 * 3
– 1 second at 25 frames/second = 31 Mb
– 1 minute at 25 frames/second = 1.85 Gb
• Data (transfer) rate = data amount per unit time (e.g 26
Mb/sec for NTSC, 31 Mb/sec for PAL)
© De Montfort University, 2001
22
Compression techniques
• Can’t assume that devices (camera, video card) used to
digitise images will be available for playback on end user
machine
• Need to provide software codec to apply a compression
technique suited to capabilities of end user machine
• All techniques operate on a sequence of bitmapped images
• Video data normally compressed and recompressed twice,
– when captured (hardware codec) –real-time compression needed
– In order to be transmitted (software codec)
© De Montfort University, 2001
23
Intra- and Inter-frame compression
• Spatial (intra-frame) compression compresses each frame
in isolation
– Lossy techniques applied, leading to some loss of image quality
• Temporal (inter-frame) compression calculates and
compressed differences between sequence of frames
– 1 Key frame + (succession of usually) 6 difference frames
– Difference frame contains difference between original frame and
preceeding key frame or preceeding difference frame
• Time to compress may be (much) longer than time to
decompress – asymetric codec
• Fast decompression times important
© De Montfort University, 2001
24
Software codecs
• Four are main contenders for compressing video for
delivery on CD-ROM, or via internet: Cinepak, Intel
Indeo, Sorenson and MPEG-1
• Cinepak, Intel Indeo, Sorenson all use vector quantisation
• Frame divided into small blocks (vectors),
• Code book contains typical block patterns
• closest approximation to code book entry worked out and
index to code book is stored instead of original vector
• Decompression (fast) obtained by replacing indices from
data stream with code book entries
• Compression (slow) as much as 150* decompress time
© De Montfort University, 2001
25
Sound Files
• Two main types
• WAV files
– Digital samples of analogue waveforms
• Midi Files
– Set of instructions to control computer
© De Montfort University, 2001
26
WAV Files
• Sound is sampled according to Nyquist Sampling Theorem
• SAMPLE RATE – at least 2 X Highest frequency
• Range of frequencies occurring in human voice is 300 3400Hz
– Telephone Sampling rate is at least 6800 Hz – is actually 8Hz
• Range of frequencies ear can detect is 20 - 20, 000 Hz
– Sampling rate is at least 40,000Hz – is actually 44,100Hz
© De Montfort University, 2001
27
15
10
Volts
5
0
Sound Level
-5
-10
-15
Time
The sound is sampled at regular intervals
© De Montfort University, 2001
28
Conversion to digital
• There are 21 signal levels
– -10 to 0 to +10
• We need 5 bits to represent this range
• Note 5 bits gives 32 combinations
– Use 0XXXX for Positive values
– Use 1XXXX for negative values
© De Montfort University, 2001
29
3volts is represented by 00011
7volts is represented by 00111
10volts is represented by 01100
-3volts is represented by 10011
-7volts is represented by 10111
-10volts is represented by 11100
0volts is represented by 00000
• Each sample is transmitted to an output device sequentially
© De Montfort University, 2001
30
Quantisation noise
•
•
•
•
•
The example uses a 1 volt step range
What if the audio sample is 7.5 volts?
The encoder gives a value of 8 volts
The decoder outputs an 8 volt signal
This error is called QUANTISATION NOISE
© De Montfort University, 2001
31
Companding
• Most audio signals are quiet
– more signals at lower levels than high levels
• Companding means using a non-linear scale
– For example, 0-5 volts might have 20 values
– 5- 8 volts might have 8 values
– 8-10 volts might have 2 values
• This gives better resolution at lower levels at the
expense of high signal levels
© De Montfort University, 2001
32
CD Quality WAV files
• Use 16 X 2 bits to represent the audio signal
• This gives 65536 X 2 “steps”
–
–
–
–
–
Quantisation noise is low
A lot of bits will carry no information (low sound levels)
This means a lot of data redundancy
WAV file size becomes large
1Mbyte = 0.7 seconds of sound
© De Montfort University, 2001
33
MIDI Files
•
•
•
•
•
These are digital sound files
Control computers, sequencers, etc
Each bit in the signal is used
Must have a MIDI player to hear the sound
File size is very small compared to WAV files
© De Montfort University, 2001
34
Audio Compression
• ADPCM
– Predicts next sample value
• TrueSpeech
– Based on mathematical model of airflow over vocal tract
– Highly efficient (1/16th)
• MPEG Audio
– Fits with MPEG Video files
© De Montfort University, 2001
35
Psychoacoustic model
Throw away samples which will not be perceived,
ie those under the curve
© De Montfort University, 2001
36
Zip Files
• Popular file compression utility
– Based on LZW
• Used to transfer or store large files
• Zipped files give good results for text and WAV files
• Poor results for graphics / video (typically 3%)
© De Montfort University, 2001
37
File Size / Performance
• There is a trade-off between:
Speed of loading
File size
Quality
• There is no one correct solution for all multimedia
applications
© De Montfort University, 2001
38