Transcript Slide 1
Data representation considers how a computer uses numbers to represent data inside the computer. Three types of data are considered at this stage: 1. Numbers including positive, negative and fractions. 2. Text. 3. Graphics. CS Topic 1 - Data Representation v2 1 Binary (Base 2) The binary system only requires two symbols. 0 and 1 are used. The columns in binary represent: 27 26 128s 64s 25 32s 24 16s e.g. the binary number 0 0 0 1 23 8s 22 4s 21 2s 20 units 0 1 0 1 is equal to 16 + 4 + 1 = 21 in decimal. The number 1110 = 8+4+2 = 14 in decimal CS Topic 1 - Data Representation v2 2 Try the following. Show your working: The number 1110 = 8+4+2 = 14 in decimal 1. 0110 4+2 = 6 2. 1001 8+1 = 9 3. 0101 4+1 = 5 4. 1111 8+4+2+1 = 15 5. 0010 2=2 6. 1101 8+4+1 = 13 CS Topic 1 - Data Representation v2 3 Try to learn the following powers of 2 by heart 28 = 256 210 = 1024 =1K 216 = 65,536= 64K 220 = 1,048,576 = 1 MB 224 = 16 MB 230 = 1 GB 232 = 4 GB 240 = 1 TB = 1 Terabyte CS Topic 1 - Data Representation v2 4 Remember the units used in the binary system. 1 byte = 1 Kilobyte = 1 Megabyte = 1 Gigabyte = 1 Terabyte = 8 bits 1024 bytes 1024 Kilobytes 1024 Megabytes 1024 Gigabytes 2048 Kilobytes = ? A. 1024 Megabytes B. 1 Gigabyte ☺C. 2 Megabytes D. 4096 bytes 3 Gigabytes = ? A. 24 Terabytes ☺B. 3072 Megabytes C. 24 Kilobytes D. 3072 Terabytes CS Topic 1 - Data Representation v2 5 Here are some useful terms used in binary Bit Binary digit (1 or 0) Byte Group of 8 bits 28 = 256 values Least significant bit(LSB) Bit furthest to the right (units) Most significant bit(MSB) Bit furthest to the left CS Topic 1 - Data Representation v2 6 The computer is a two-state (binary) machine. All components inside a computer and all backing storage devices have only two states. e.g. • a switch is on or off. • a transistor conducts or does not conduct. • a signal is a pulse of electricity or no pulse. • an area of a magnetic disk is positive or negative. • with laser technology light can reflect in two different directions. Binary, using the numbers 0 and 1, can be represented by a two state system. CS Topic 1 - Data Representation v2 7 Advantages of using Binary 1. A simple two-state system is less complex to represent using electrical signals than our decimal ten-state system. Degradation in signal levels does not corrupt the information as easily and so there is less chance of errors. 2. A two state system is easy to store magnetically and optically. 3. Calculations are simpler. There are only four rules for addition. These can be easily built into the electronic circuits. CS Topic 1 - Data Representation v2 0+0= 0+1= 1+0= 1+1= 0 1 1 0 carry 1 8 The disadvantages of using binary are that: 1. A binary number has more digits than its decimal equivalent. i.e. it will be longer. This is not a problem for the computer but it makes it harder for us to read and work with. 2. Binary is more difficult than decimal for us to read as we are more used to decimal. CS Topic 1 - Data Representation v2 9 An integer is a whole number, positive or negative. Every integer stored in the computer is allocated the same amount of space, whether it is a large integer or a small integer. The number of bits allocated determines the range of numbers which can be stored. If one byte was allowed then the largest integer would be: 11111111 which is 255 in decimal or 28 - 1 Two bytes would allow: 216 -1 possibilities = a range from 0 to 65535. CS Topic 1 - Data Representation v2 10 If a computer only had to store positive integers then we could easily convert each number into its binary equivalent as you saw in the examples earlier. However, negative numbers have to be stored too and we need to find a method of representing a –ve sign using 1s and 0s. Modern computers use the Two’s complement method to represent integers. CS Topic 1 - Data Representation v2 11 With this method we take the most significant bit (the one on the far left) and treat it as a negative number. The following examples illustrate the principle using 4 bit numbers to help you understand. A modern computer would use 32 bit numbers for integers. In your NABS and final exams you are likely to be asked to use 8 bit numbers and you will practise with these later. CS Topic 1 - Data Representation v2 12 Two’s Complement Binary 1000 1001 1010 1011 1100 1101 1110 1111 0000 0001 0010 0011 0100 0101 0110 0111 Decimal -8 -7 -6 -5 -4 -3 -2 -1 0 +1 +2 +3 +4 +5 +6 +7 In this table the 1 at the far left represents -8 (negative 8). Make sure that you understand this concept Note that the range is still 24 = 16 numbers = -8 to +7 CS Topic 1 - Data Representation v2 13 Range and Accuracy of Two’s Complement 1. The range of numbers which can be stored depends on the number of bits being used. 4 bit numbers have a range -8 to +7 8 bit numbers have a range -128 to +127 2. In a modern computer 32 bits are used stored integers. This gives a range of 232 around -2,147,483,648 to +2,147,483,647 3. Numbers stored using two’s complement are always 100% accurate. CS Topic 1 - Data Representation v2 14 8 Bit Two’s complement numbers Here is an example of how to work out the Two’s complement for the number -80 -128 64 32 16 8 -80 = 1 0 1 1 0 4 2 1 0 0 0 128 80 48 32 16 16 0 1. The number is negative so put a 1 in the first column. 2. Subtract the 80 from 128. 3. Now make 48 from the remaining columns using normal binary rules. CS Topic 1 - Data Representation v2 15 Express the following numbers using 8 bit Two’s complement: -128 64 32 16 8 4 2 1 1. -45 1 1 0 1 0 0 1 1 2. -21 1 1 1 0 1 0 1 1 3. -16 1 1 1 1 0 0 0 0 4. 127 0 1 1 1 1 1 1 1 5. -129 Number out of range CS Topic 1 - Data Representation v2 16 Real numbers (numbers with a decimal point in them) are stored using floating point representation. This is like standard form/scientific notation used in decimal. 1101.101 = .1101101 x 2100 1. The binary point is moved to the far left. 2. The point has been moved 4 to the left so we need to multiply by 24. The power 4 = 100 in binary. CS Topic 1 - Data Representation v2 17 The general form of this representation is m x be where m = mantissa (the number) b = base e = exponent (the power) As the base is always 2 and the point is always at the far left, we only need to store the mantissa and the exponent, so the number 1101.101 becomes: Mantissa 1101101 Exponent 100 CS Topic 1 - Data Representation v2 18 Range and Precision of Floating Point numbers 1. The range of numbers which can be stored depends on the number of bits being used for the exponent. The exponent has no effect on precision. 2. The precision of the numbers being stored depends on the number of bits being used for the mantissa. The mantissa has no effect on range. 3. In a modern computer, floating point allows: A 4 byte mantissa -231 to +231 A 1 byte exponent -128 to 127 In decimal this means accuracy to 9 significant figures and a range from 10-38 to 1038. CS Topic 1 - Data Representation v2 19 Text is made up of characters and each character is allocated its own binary code. The set of characters that can be represented by a computer is known as the character set. Western world alphabets need around 80 characters. These are made up of 26 upper case letters, 26 lower case letters, 10 digits 0-9, and around 20 punctuation marks. 80 characters would need a 7 bit code. This would allow 27 different codes = 128 CS Topic 1 - Data Representation v2 20 It is useful to have a standard code so that text can be transferred between different types of computer easily without the need for translation. ASCII and Unicode are two of the most common codes in use today. ASCII (American Standard Code for Information Interchange) is a 7 bit code allowing 128 characters. These include 96 displayable characters and 32 control characters which control the display devices. Examples of these include: Code 13 = Carriage Return Code 9 = TAB Code 10 = Line feed Code 8 = Backspace CS Topic 1 - Data Representation v2 21 ASCII is often extended to 8 bit which allows 28 = 256 different characters. These include alphabetic characters in foreign languages and accented characters. This standard became known as extended ASCII and then ISO 8859. ASCII was designed to cope with Western based character sets such as English, French, German but did not include Japanese or Arabic symbol shapes. The increase in worldwide communication led to a need for a larger standard code to cope with other foreign alphabets, technical symbols etc. CS Topic 1 - Data Representation v2 22 Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. www.unicode.org Unicode use a 16 bit code for each character. This provides a unique code for up to 216 = 65,536 characters. Unicode includes all the ASCII character codes to ensure compatibility. CS Topic 1 - Data Representation v2 23 Unicode Advantage – Can represent many more characters than ASCII. Disadvantage – takes up more space to store Unicode than it does to store ASCII. CS Topic 1 - Data Representation v2 24 The graphic is seen as a matrix of (picture elements) pixels and the colour of each pixel is represented by a binary code. This simple graphic of a match stick man could be stored as a series of binary numbers. In black and white mode, each pixel requires a one bit code: 0 for white 1 for black ███ 000111000 ███ 000111000 ████████ 111111111 ███ 000111000 █ ██ █ ██ CS Topic 1 - Data Representation v2 001000100 110000011 25 Resolution refers to the number of pixels in the width and height of the image. The more pixels there are in the image the higher the resolution. A typical 15’’ TFT screen could have a resolution of 1024 x768 = 786,432 pixels Bit depth refers to to the number of bits needed to represent the colour of each pixel. Greyscale simply means shades of grey and so each shade needs its own code. A 2 colour image would require a 1 bit code. e.g. 0 = red 1 = green CS Topic 1 - Data Representation v2 26 A 16 colour image would need a 4 bit code(=24). Increasing the number of colours that are available increases the size of the code for each colour. Bit depth x (No of bits in code) No of colours available = 2x 1 2 4 16 8 256 16 65,536 24 (true colour) 16 million CS Topic 1 - Data Representation v2 0000 = red 0001 = green 0010 = blue 0011 = yellow 0100 = orange 0101 = etc. 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 27 Here is an example of how to calculate memory requirements for an image on a screen 800x600 using 16 million colours. Number of pixels = 800 x 600 = 480000 pixels Bit Depth is 2x = 16 million so bit depth = 24 bits i.e. you need a 24 bit code to represent the colour for each pixel. The file size is 480000 x 24 bits = 11520000 bits Divide by 8 to find the number of bytes. = 1440000 Bytes Keep dividing by 1024 until to you have an appropriate unit. /1024 = 1.4 MB 1440000/1024 = 1406.25 KB CS Topic 1 - Data Representation v2 28 Remember that the size of an image depends on the number of pixels and the bit depth. 1. Find the number of pixels. 2. Find the bit depth. (express answer in bits) 3. Multiply the pixels by the bit depth to give an answer in bits. 4. Divide by 8 to give the answer in bytes. 5. Keep dividing by 1024 to find the answer in KB, MB or GB. Resolution 640 x 420 No of colours 16 800 x 600 1024 x 768 65,536 256 File size 131.2 KB 937.5 KB 768 KB CS Topic 1 - Data Representation v2 29 Sometimes you are given the bit depth in the question e.g. 24 bit colour. This makes the question easier. If you are only told how many colours can be represented then unfortunately you have to calculate the bit depth using the equation: 2x = number of colours where x is the bit depth. Use a calculator to do this if necessary. CS Topic 1 - Data Representation v2 30 A higher bit depth allows more colours so the quality of photographs etc will improve. Disadvantage: the file size will increase. If asked to work out how many images can be stored on a backing storage medium then remember to round down your answer as you would want to store complete images. Here is a worked example: CS Topic 1 - Data Representation v2 31 How many 8.4 MB images can be stored on a 1 GB memory stick? 1. Make sure that each number is using the same units. So 1 GB = 1024 MB 2. Divide the capacity by the number of images 1024/8.4 = 121.904 3. Round down the answer (Remember that you wouldn’t store a part of an image!) You can store 121 images on a 1 GB memory stick. CS Topic 1 - Data Representation v2 32 Bit Map graphics - Advantages 1. You can edit individual pixels in the image. 2. It is easy to draw freehand shapes. Bit Map graphics - Disadvantages 1. File sizes are large as the content of every pixel has to be stored (even blank (background) pixels). 2. Resolution dependent - when a graphic is created at a particular resolution it cannot then take advantage of a higher resolution device. It becomes "blocky" if enlarged. 3. It is difficult to manipulate shapes on the screen. (e.g. move, scale, rotate or layer) CS Topic 1 - Data Representation v2 33 A graphic is seen as being made up of a series of objects. A mathematical description of each object is stored as a set of instructions or formulae. A straight line can be stored as a set of two co-ordinate pairs, a line colour, thickness, pattern and layer. A square has co-ordinates for four points, four coordinate pairs, line colour, thickness, pattern, fill pattern and layer. This information allows the objects to be represented accurately. CS Topic 1 - Data Representation v2 34 Vector graphics - Advantages 1. Resolution independent - a graphic created at a particular resolution can take advantage of a higher resolution device. It will still look in proportion. 2. It is easy to manipulate shapes on the screen. (e.g. move, scale, rotate or layer) 3. File sizes are generally smaller as values do not need to be held for every pixel. 4. Objects can be grouped to form larger objects that can then be manipulated as a single object Vector graphics - Disadvantages 1. It is difficult to represent freehand shapes as the computer needs to describe them mathematically. 2. You cannot edit individual pixels. CS Topic 1 - Data Representation v2 35 Bit mapped & vector graphics - File size Vector - The more objects there are on the screen the bigger the file size will be. Bit-mapped - At any given resolution and bit depth, the file size will be the same. It doesn’t matter what is actually on the screen. The content of every pixel has to be stored. CS Topic 1 - Data Representation v2 36 Graphics on screen and at the printer Bit mapped and Vector are different ways of representing graphics in RAM and on disk. It is important to remember that monitors and printers always display a graphic as a bit-map. A vector graphic has to be converted into a bit map before it is displayed on the screen or printed out. This is called rasterising or rendering. Bit mapped packages often have the word Paint or Photo associated with them. e.g. Adobe Photoshop. Vector packages often contain the words Draw or Design e.g. Corel Draw. CS Topic 1 - Data Representation v2 37