Transcript Document

Comp201 Computer Systems
Data Representation
1
Data Representation

Describes the methods by which data can be
represented and transmitted in a computer.

Reading: Chapter Three – Englander
2
Data Representation



Alphanumeric data
Big Endian vs Little Endian
Images



Bit map
Vector
Audio
3
Example Data Representations
Type
Standard
Alphanumeric
ASCII,EBCDIC, Unicode
Image(bit map)
GIF,PCX,TIFF,BMP
Image (object)
PICT, Postscript
Sound
WAV, AVI,MP3,MIDI
Page description
PDF, HTML
Video
Quicktime,MPEG2,Real
Video
4
Alphanumeric Data



00
nul
10
dle
20
sp
30
0
40
@
50
P
60
`
70
p
Many applications process text (e.g. compilers and
word processors)
coding schemes include ASCII, EBCDIC and Unicode
ASCII table (in hex)
01
soh
11
dc1
21
!
31
1
41
A
51
Q
61
a
71
q
02
sot
12
dc2
22
"
32
2
42
B
52
R
62
b
72
r
03
etx
13
dc3
23
#
33
3
43
C
53
S
63
c
73
s
04
eot
14
dc4
24
$
34
4
44
D
54
T
64
d
74
t
05
enq
15
nak
25
%
35
5
45
E
55
U
65
e
75
u
06
ack
16
syn
26
&
36
6
46
F
56
V
66
f
76
v
07
bel
17
etb
27
'
37
7
47
G
57
W
67
g
77
w
08
bs
18
can
28
(
38
8
48
H
58
X
68
h
78
x
09
ht
19
em
29
)
39
9
49
I
59
Y
69
i
79
y
0a
nl
1a
sub
2a
*
3a
:
4a
J
5a
Z
6a
j
7a
z
0b
vt
1b
esc
2b
+
3b
;
4b
K
5b
[
6b
k
7b
{
0c
np
1c
fs
2c
,
3c
<
4c
L
5c
\
6c
l
7c
0d
cr
1d
gs
2d
3d
=
4d
M
5d
]
6d
m
7d
}
0e
so
1e
rs
2e
.
3e
>
4e
N
5e
^
6e
n
7e
~
0f
si
1f
us
2f
/
3f
?
4f
O
5f
_
6f
o
7f
del 5

Characters

Example what does the following ASCII string
represent
48 65 6C 6C 6F 20 32 30 31
6
Sorting ASCII Characters

ASCII and EBCDIC codes are designed so
the computer can do alphabetic comparisons.


In Windows, comparisons are case insensitive (in
most instances)…
In Unix comparisons are generally case sensitive
7
EBCDIC
codes
Reference: Englander, chapter 3
8
UNICODE
Greek




16 bit code (encode 65536 characters)
Modelled on ASCII character set
Encodes most characters currently in use
Uses scripts to define characters in a
particular language
Tibetan
Dingbats
Katakana
9
Big Endian vs Little Endian
0
7

On most computers the
storage unit is a byte
100
byte
101
byte
102
byte
103
byte
Address

Multiple bytes are required to store most data
types (e.g. integers => 4 bytes)
MSB
31
LSB
24 23
16 15
8 7
0
10
Big Endian vs Little Endian

How do we pack words into a byte addressable memory?
MSB
31
Big Endian
24 23
16 15
8 7
0
Little Endian
0
7
100
LSB
MSB
100
101
101
102
102
103
LSB
0
7
103
LSB
MSB
11
Question: Does it matter?





Answer: yes, of course… but not much! The
differences in performance are minor, but (of course)
you must choose one. Yes, it is possible to swap
around in software between them.
Intel processors use Little Endian
Motorola processors use Big Endian.
Some programs (e.g.Windows) likewise insist on a
particular format… Windows .bmp format is Little
Endian, for instance
Internet protocols are Big Endian. Conversion is
required on Little Endian processors.
12
Some common file formats
(reference: www.cs.umass.edu , Dr. William Verts)

Big Endian






Adobe Photoshop
JPEG
MacPaint
SGI (silicon graphics)
Sun Raster
WPG (word perfect
graphics metafile)

Little Endian





BMP
GIF
PCX (paintbrush)
QTM (quicktime)
Microsoft RTF
And some can be either, selected by codes in file…
13
Pictures


Many different formats used to store
Images in a computer
Two Main categories
 Bit map images




E.g. photographs paintings
Characterised by continuous variations
in shading, colour, shape and texture
Necessary to store info about each point
Vector Images

176.0"

Made up of geometrical shapes
(e.g. lines circles etc)
Sufficient to store geometrical
detail plus its position
251.2"
14
Bit map Images

Many different formats:

bit map e.g. GIF, TIFF, ...
Image:
Binary
Representation:
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
1
1
1
1
1
0
0
0
0
1
0
0
1
1
1
1
1
0
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
15
Bit map storage



Consider an image with 600 rows of 800 pixels – one byte
used to store each of the three colours of each pixel
 Total memory = 600 * 800 * 3  1.5MB
Alternative representation is to use a palette – a lookup
table which defines the colours in the image
 An index into this table is then stored for each pixel
Can also reduce the size by reducing the resolution (I.e.
increase the size of each pixel) or by employing various
compression algorithms (p 78, 79) to lower storage
requirements.
16
Example – GIF






Graphics Interchange Format
A proprietary format developed in 1987
Gif 89A defn:
http://www.dcs.ed.ac.uk/home/mxr/gfx/2d/GIF89a.txt
Assumes a rectangular screen containing a number
of images
Areas not filled with images are filled with a
background colour
Uses a palette to store 256 colours
17
Gif Screen
18
GIF File Format
19
Vector Graphics

series of objects such as lines and circles e.g. PICT, TIFF,
...
A



line 0,50,100,50
line 50,0,50,100
char A, 75, 25
20
Example Postscript




A page description language
An image consists of a program written in the
postscript language
Encoded in ASCII or Unicode
Contains functions to





draw lines
Draw bezier curves
Join simple object into more complex ones
Translate or scale an object
Fill an object
21
Figure 3.13 A PostScript program
22
Figure 3.14 Another PostScript program
23
Audio Data





Sound is normally digitised from an audio
source
Analog waveform sampled at regular times
intervals
The amplitude at each interval is recorded
using an A-to-D converter
Most positive peak set max binary number
Most negative peak set to zero
24
Figure 3.15 Digitizing an audio waveform
25
Wave (.WAV) Sound format





Designed by Microsoft
Supports 8 or 16 bit sound samples
Sample rates 11.025KHz, 22.05KHz or 44.1KHz
Supports stereo or mono
Very simple format
Length of
Chunk
Chunk ID
R
I
F
F
W A V E
DATA
26
Wave Format
Length of
Chunk
Chunk ID
R
I
F
W A V E
F
Format Chunk
f
m t
DATA
Data Chunk
10 00 00 00 01 00
Length of
Format
Chunk
Always (0x10)
Always (0x01)
Sample Rate
(Binary, in Hz)
Channel Numbers
(Always 0x01=Mono,
0x02=Stereo)
Bytes Per
Second
Bits Per
Sample
Bytes Per Sample:
1=8 bit Mono,
2=8 bit Stereo or 16 bit Mono,
4=16 bit Stereo
27
Wave Format
Length of
Chunk
Chunk ID
R
I
F
F
W A V E
Format Chunk
d
a t
a
DATA
Data Chunk
Data
Length of
data
28
Some statistics:


If we encode sound at 44kHz, each sample at
16 bits, stereo (2 channels), this amounts to
1.4 MBits/sec and three minutes will take
about 25 Mbytes of space!
It we only encode the most important
features, it is termed data compression, and
can reduce file size by about 10:1
29
Two popular methods



Real Audio is one method used for data
compression.
MP3 is another.
Comparative file sizes:



WAV file at 44KHz, 16 bit: 5 MB
Real Audio will take 304KB
MP3 will take 409KB

Source: www.howstuffworks.com
30