Chapter 3 Data Representation Data and Computers  Computers are multimedia devices, dealing with many categories of information. Computers store, present, and help us modify      Numbers Text Audio Images.

Download Report

Transcript Chapter 3 Data Representation Data and Computers  Computers are multimedia devices, dealing with many categories of information. Computers store, present, and help us modify      Numbers Text Audio Images.

Slide 1

Chapter 3
Data Representation


Slide 2

Data and Computers


Computers are multimedia devices, dealing
with many categories of information.
Computers store, present, and help us modify







Numbers
Text
Audio
Images and graphics
Video

2


Slide 3

Analog and Digital Information


Computers are finite. Computer memory and
other hardware devices have only so much
room to store and manipulate a certain
amount of data. The goal of data
representation is to represent enough of the
world to satisfy our computational needs and
our senses of sight and sound.

3


Slide 4

Analog and Digital Information


Information can be represented in one of two ways:
analog or digital.
Analog data A continuous representation, analogous to
the actual information it represents.
Digital data A discrete representation, breaking the
information up into separate elements.

4


Slide 5

Analog and Digital Information
A mercury thermometer exemplifies
analog data as it continually rises
and falls in direct proportion to the
temperature.
Digital displays only show discrete
information.

5


Slide 6

Analog and Digital Information




Computers cannot work well with analog
information, so we digitize information by breaking it
into pieces and representing those pieces
separately.
Why do we use binary? Modern computers are
designed to use and manage binary values because
the devices that store and manage the data are far
less expensive and far more reliable if they only
need to represent one of two possible values.

6


Slide 7

Electronic Signals




An analog signal continually fluctuates up
and down in voltage. But a digital signal has
only a high or low state, corresponding to the
two binary digits.
All electronic signals (both analog and digital)
degrade as they move down a line. That is,
the voltage of the signal fluctuates due to
environmental effects.

7


Slide 8

Electronic Signals (Cont’d)

An analog and a digital signal

Degradation of analog and digital signals

Periodically, a digital signal can be reclocked to
regain its original shape. No such process is
available for analog signals.
8


Slide 9

Representing Text








To represent a text document in digital form, we
need to be able to represent every possible
character that may appear.
There is a finite number of characters to represent,
so the general approach is to list them all and
assign each a binary string.
A character set is a list of characters and the codes
used to represent each one.
By agreeing to use a particular character set,
computer manufacturers have made the processing
of text data easier.
9


Slide 10

The ASCII Character Set



ASCII stands for American Standard Code for
Information Interchange.
The ASCII character set originally used seven
bits to represent each character, allowing for
128 unique characters.



Wikipedia has an excellent entry on ASCII.



10


Slide 11

The ASCII Character Set

11


Slide 12

The ASCII Character Set



Notice the organisation of the ASCII table.
The table divides in half according to the MSB.


Letters are all in the second half so all codes for alphabetic
characters start with 1.


This second half of the table divides in half again according to
the next bit:





UPPERCASE letters start 10.
lowercase letters start 11.

The first half of the table also divides in half according to the
next bit:



Control characters start 00.
Numerals and punctuation start 01.

12


Slide 13

The ASCII Character Set




Note that control characters (the first 32 in
the ASCII character set) do not have simple
character representations that you could
print to the screen.
Some, however, perform actions with which
you are familiar.

13


Slide 14

The ASCII Character Set
Coding letters in ASCII is easy.
Let’s look at ‘j’ as an example:
Since ‘j’ is a letter, its code starts with a 1.
Since it’s lowercase, the next bit is also a 1.
Since it’s the tenth letter of the alphabet the
rest of the code is 01010.
The complete ASCII code for ‘j’ is 1101010.

14


Slide 15

The ASCII Character Set



ASCII evolved so that eight bits were used.
The 7-bit codes were simply prefixed with
another bit, giving another natural doubling.


The original 7-bit codes were padded with 0.




So the code for ‘j’ became 01101010.

128 new characters were added.
The codes for this alternate character set start
with 1.

15


Slide 16

The Unicode Character Set






Even the extended version of the ASCII character
set is not enough for international use.
The Unicode character set uses 16 bits per
character. The Unicode character set can represent
216, or over 65 thousand characters.
Unicode was designed to be a superset of ASCII.
That is, the first 256 characters in the Unicode
character set correspond exactly to the extended
ASCII character set.

16


Slide 17

The Unicode Character Set

Figure 3.6 A few characters in the Unicode character set

17


Slide 18

Data Compression


It is important that we find ways to store and
transmit data efficiently, which leads computer
scientists to find ways to compress it.



Data compression is a reduction in the amount
of space needed to store a piece of data.



Compression ratio is the size of the
compressed data divided by the size of the
original data.
18


Slide 19

Data Compression


A data compression technique can be






lossless, which means the data can be retrieved
without any loss of the original information,
lossy, which means some information may be lost
in the process of compaction.

As examples, consider these 3 techniques:





keyword encoding
run-length encoding
Huffman encoding

19


Slide 20

Keyword Encoding




Frequently used words
are replaced with a
single character.
For example…
Note, that the characters
used to encode cannot
be part of the original
text.

20


Slide 21

Keyword Encoding


Consider the following paragraph,
The human body is composed of many
independent systems, such as the circulatory
system, the respiratory system, and the
reproductive system. Not only must all systems
work independently, they must interact and
cooperate as well. Overall health is a function of
the well-being of separate systems, as well as
how these separate systems work in concert.

21


Slide 22

Keyword Encoding


This version highlights the words that can be
replaced.
The human body is composed of many
independent systems, such as the circulatory
system, the respiratory system, and the
reproductive system. Not only must each system
work independently, they must interact and
cooperate as well. Overall health is a function of
the well-being of separate systems, as well as
how those separate systems work in concert.
22


Slide 23

Keyword Encoding


This is the encoded paragraph:
The human body is composed of many
independent systems, such ^ ~ circulatory system,
~ respiratory system, + ~ reproductive system.
Not only & each system work independently, they
& interact + cooperate ^ %. Overall health is a
function of ~ %- being of separate systems, ^ % ^
how # separate systems work in concert.

23


Slide 24

Keyword Encoding






There are a total of 349 characters in the
original paragraph including spaces and
punctuation.
The encoded paragraph contains 314
characters, resulting in a savings of 35
characters.
The compression ratio for this example is
314/349 or approximately 0.9.

24


Slide 25

Keyword Encoding




A compression ratio of .9 (90%) is NOT very
good. The compressed file is 90% the size of
the original.
However, there are several ways this can be
improved. Can you think of some?

25


Slide 26

Run-Length Encoding




A single character may be repeated over and
over again in a long sequence. This type of
repetition doesn’t generally take place in
English text, but often occurs in large data
streams.
In run-length encoding, a sequence of
repeated characters is replaced by:





a flag character,
followed by the repeated character,
followed by a single digit that indicates how many
times the character is repeated.
26


Slide 27

Run-Length Encoding
Some examples:
AAAAAAA
would be encoded as
*A7
*n5*x9ccc*h6 some other text *k8eee
can be decoded into the following original text:
nnnnnxxxxxxxxxccchhhhhh some other text
kkkkkkkkeee

27


Slide 28

Run-Length Encoding




In the second example, the original text
contains 51 characters, and the encoded string
contains 35 characters, giving us a
compression ratio of 35/51 or approximately
0.68.
Since we are using one character for the
repetition count, it seems that we can’t encode
repetition lengths greater than nine. However,
instead of interpreting the count character as
an ASCII digit, we could interpret it as a binary
number.
28


Slide 29

Huffman Encoding






Why should the blank, which is used very
frequently, take up the same number of bits
as the character “X”, which is seldom used in
text?
Huffman codes use variable-length bit strings
to represent each character.
A few characters may be represented by five
bits, and another few by six bits, and yet
another few by seven bits, and so forth.

29


Slide 30

Huffman Encoding


If we use only a few bits to represent
characters that appear often and reserve
longer bit strings for characters that don’t
appear often, the overall size of the
document being represented will be smaller.

30


Slide 31

Huffman Encoding


An example of a
Huffman alphabet

31


Slide 32

Huffman Encoding







DOORBELL would be encoded in binary as
1011110110111101001100100.
If we used a fixed-size bit string to represent each
character (say, 8 bits), then the binary from of the
original string would be 64 bits.
The Huffman encoding for that string is 25 bits long,
giving a compression ratio of 25/64, or
approximately 0.39.
An important characteristic of any Huffman encoding
is that no bit string used to represent a character is
the prefix of any other bit string used to represent a
character.
32


Slide 33

Representing Audio Information




We perceive sound when a series of air
compressions vibrate a membrane in our ear,
which sends signals to our brain.
A stereo sends an electrical signal to a
speaker to produce sound. This signal is an
analog representation of the sound wave.
The voltage in the signal varies in direct
proportion to the sound wave.

33


Slide 34

Representing Audio Information






To digitize the signal we periodically measure
the voltage of the signal and record the
appropriate numeric value. The process is
called sampling.
In general, a sampling rate of around 40,000
times per second is enough to create a
reasonable sound reproduction.
The standard sampling rate for CDs is 44.1
kHz. The Pro Audio standard is 48 kHz.
34


Slide 35

Representing Audio Information

Figure 3.8 Sampling an audio signal

35


Slide 36

Representing Audio Information
It should be noted that the potential loss of
peak values suggested in the previous slide
is a myth. The time lapse between samples
is much too short for any such loss.
The human ear hears sounds between 20 Hz
and 20,000 Hz. Sampling at twice this
frequency (44,000+) eliminates any potential
loss of data.
For a complete explanation refer to the
Nyquist–Shannon sampling theorem.
36


Slide 37

Representing Audio Information


A compact disk (CD) stores audio information
digitally. On the surface of the CD are
microscopic pits that represent binary digits.
A low intensity laser is pointed at the disc.
The laser light reflects strongly if the surface
is smooth and reflects poorly if the surface is
pitted.

37


Slide 38

Representing Audio Information

Figure 3.9
A CD player reading
binary information

38


Slide 39

Audio Formats


Audio Formats




WAV, AU, AIFF, VQF, and MP3.

MP3 is dominant



MP3 is short for MPEG-2, audio layer 3 file.
MP3 employs both lossy and lossless compression.




First it analyses the frequency spread and compares it to
mathematical models of human psychoacoustics (the study of
the interrelation between the ear and the brain), and it
discards information that can’t be heard by humans.
Then the bit stream is compressed using a form of Huffman
encoding to achieve additional compression.

39


Slide 40

Representing Images and Graphics




Colour is our perception of the various
frequencies of light that reach the retinas of
our eyes.
Our retinas have three types of colour
photoreceptor cones which respond to
different sets of frequencies. These
photoreceptor categories correspond to the
colours of red, green, and blue.

40


Slide 41

Representing Images and Graphics




Colour is often expressed in a computer as
an RGB (red, green, blue) value, which is
actually three numbers that indicate the
relative contribution of each of these three
primary colours.
For example, an RGB value of (255, 255, 0)
maximizes the contribution of red and green,
and minimizes the contribution of blue. The
resulting colour is a bright yellow.
41


Slide 42

Representing Images and Graphics

Figure 3.10 Three-dimensional color space

42


Slide 43

Representing Images and Graphics

43


Slide 44

Representing Images and Graphics






The amount of data that is used to represent
a colour is called the colour depth.
HiColor is a term that indicates a 16-bit colour
depth. Five bits are used for each number in
an RGB value and the extra bit is sometimes
used to represent transparency.
TrueColor indicates a 24-bit colour depth.
Therefore, each number in an RGB value
gets eight bits.

44


Slide 45

Representing Images and Graphics






HiColor uses 5 bits for each number.
5
 Since 2 = 32, there are 32 different levels for each of
the 3 primary colours. So there are 323 (or 215)
possible colours.
 This is a total of 32,768 different colours.
TrueColor uses eight bits for each colour component.
 28* 28* 28 = 224 or 16,777,216 colours.
Some monitors can use as many as 32 bits for colour
depth.
 This is potentially 4,294,967,296 colours!

45


Slide 46

Representing Images and Graphics




The human eye is able to distinguish about
200 intensity levels in each of the three
primaries red, green, and blue. All in all, up to
10 million different colours can be
distinguished.
So modern monitors are examples of
solutions without a problem.


If the human eye can distinguish only 10 million
colours, why develop monitors that can display
over 4 billion?
46


Slide 47

Indexed Colour


A particular application such as a browser
may support only a certain number of specific
colours, creating a palette from which to
choose. For example, Netscape Navigator’s
colour palette has only 216 colours.

47


Slide 48

Digitized Images and Graphics








Digitizing a picture is the act of representing it
as a collection of individual dots, called
pixels.
The number of pixels used to represent an
image is called the resolution.
As an example, the resolution of many
monitors is 1024 X 768, or 786,432 pixels.
If the colour of each pixel is stored as 24 bits
(3 bytes) of data, the screen alone requires
2,359,296 bytes (2 megaBytes) of memory.
48


Slide 49

Digitized Images and Graphics

Figure 3.12 A digitized picture composed of few individual pixels
49


Slide 50

Digitized Images and Graphics

Figure 3.12 A digitized picture composed of many individual pixels

50


Slide 51

Digitized Images and Graphics




The storage of image information on a pixelby-pixel basis is called a raster-graphics
format.
There are several popular raster file formats
including:





BMP (bitmap)
GIF (Graphics Interchange Format)
JPEG (Joint Photographic Experts Group)

51


Slide 52

Vector Graphics




Instead of assigning colours to pixels as we
do in raster graphics, a vector-graphics
format describes an image in terms of lines
and geometric shapes.
A vector graphic is a series of commands that
describe a line’s direction, thickness, and
colour. The file size for these formats tends to
be small because every pixel does not need
to be represented.
52


Slide 53

Vector Graphics






Vector graphics can be resized
mathematically, and these changes can be
calculated dynamically as needed.
This makes them particularly useful for
defining scalable fonts.
However, vector graphics is not a good
technique for representing real-world images.

53


Slide 54

Representing Video




A video codec (COmpressor/DECompressor)
refers to the methods used to shrink the size of
a movie to allow it to be played on a computer
or over a network.
Almost all video codecs use lossy compression
to minimize the huge amounts of data
associated with video.

54


Slide 55

Representing Video


Codecs use two types of compression spatial and temporal.




Spatial compression A technique based on
removing redundant information within a frame.
This problem is essentially the same as that faced
when compressing still images.
Temporal compression A technique based on
differences between consecutive frames. If most of
an image in two frames hasn’t changed, why
should we waste space to duplicate all of the
similar information?
55