Data Description

Download Report

Transcript Data Description

TOPIC 6 : DATA DESCRIPTION

6.1 – Introduction to Data 6.2 – Measures of Location 6.3 - Measures of Dispersion 1

6.1 – Introduction to Data Learning outcomes: At the end of this topic, students should be able to: (a) identify the discrete and continuous data (b) identify ungrouped and grouped data (c) construct and interpret stem-and-leaf diagrams 2

6.1 – Introduction to Data Statistics involves collecting, organizing, presenting and analyzing data in order to obtain useful information for decision making.

POPULATION

is a collection of

all

elements whose characteristics are being studied.

SAMPLE

is a group of elements drawn from a population which is representative of that group .

A sample is a

subset

of a population 3

Parameter

is a numerical measurement describing some characteristics of a

population

e.g ; mean, median, mode..

.

Variable

is a characteristic or attribute that can take different values.

e.g ; height of student

Variables

can be classified into Discrete Data and Continuous Data 4

DATA A collection of observations or measurements or information obtained from study that is carried out.

Quantitative Data data that can be measured numerically Data Discrete data Continuous data Qualitative Data data that cannot assume a numerical value but can be classified into categories 6

Discrete data Discrete data are data that assume integer values. E.g; The number of teachers in a school Continuous data Continuous data are data that assume any numerical values in a certain interval on the real line.

E.g; The height of students in KMM, 130.2 cm, 132.5 cm, 131.8 cm…… 7

Raw data can be represented in

ungrouped data

and

grouped data .

Data (a) UNGROUPED DATA (b) GROUPED DATA

data which listed as a sequence or in the form of a frequency table but without the use of intervals data which are categorized into class intervals

8

Example The following are the length of 12 leaves collected from a garden measured to the nearest cm. 10 9 11 11 14 10 12 11 13 13 9 12 these data are called raw data .

9

The data can be summarized as a FREQUENCY DISTRIBUTION TABLE.

Length Of Leaf (cm) 9 10 11 12 13 14 Frequency 2 2 3 2 2 1 The data shown in this frequency distribution above is known as ungrouped data

The frequency distribution below shows the same data but grouped into the following intervals.

Intervals 9 to 10

Length Of Leaf (cm) Frequency 9-10 4 11-12 5 13-14 3 Data in the form of the frequency distribution table shown above is known as

grouped data.

11

Stem and leaf diagram

• • • Stem and leaf diagram is another technique of illustrating the quantitative data. Each value is divided into two parts, which are the stem and the leaf.

The digit(s) in the greatest place value(s) of the data values are the stems.

• • The digits in the next greatest place values are the leaves. For example,  if all the data are two-digit numbers,  the number in the tens place would be used for the stem. The number in the ones place would be used for the leaf.

Example 1

Construct a stem-and-leaf diagram for the data below: 12, 13, 21, 27, 33, 34, 35, 37, 40, 41

Steps for constructing stem and leaf diagram.

• • Step 1 Separate each value into two parts, i.e. the stem and the leaf Since given value consisting of two digits, therefore first digit can be used as the stems. The leaves consists of the second digit. (when the values are big, the stem can consist of several digits)

• Step 2 Draw a vertical line and list the stem on the left following the magnitude starting from the smallest number.

• Step 3 List the leaf, i.e. The corresponding second digit on the right of the vertical line.

Solution

Stem 1 2 3 4 2 3 Leaf 1 7 3 4 5 7 0 1

Example 2

Construct a stem-and-leaf diagram for the data of a test scores for a group of students: 92, 92, 96, 98, 83, 85, 72, 74, 76, 78, 78 79, 61, 64, 64, 67, 68, 50, 50, 52, 58, 58

Solution

Test scores out of 100 Stem Leaf 9 2 2 6 8 8 7 3 5 2 4 6 8 8 9 6 5 1 4 4 7 8 0 0 2 8 8

Based on the stem and leaf diagram: • 4 students got a mark in the 90's on their test out of 100. • 2 students received the same mark of 92. • No marks were received below 50. • No mark of 100 was received. When you count the total amount of leaves, you know how many students took the test.

Exercise

Try your own Stem and Leaf diagram with the following temperatures for June 77 80 82 68 65 59 61 57 50 62 61 70 69 64 67 70 62 65 65 73 76 87 80 82 83 79 79 71 80 77

Solution Stem

5 6

Temperatures Leaf

0 7 9 1 1 2 2 4 5 5 5 7 8 9 7 8 0 0 1 3 6 7 7 9 9 0 0 0 2 2 3 7 12/10/11

6.2 Measures Of Location

Learning outcomes: At the end of this topic, students should able to: (a) Find and interpret the mean, mode and median for ungrouped data.

(b) Find and interpret the mean, mode, median, quartiles and percentiles for grouped data. (c) Construct and interpret box-and-whisker plots.

23

Data UNGROUPED DATA GROUPED DATA

data which listed as a sequence or in the form of a frequency table but without the use of intervals mean, mode median data which are categorized into class intervals mean, mode median, quartiles and percentiles

24

Ungrouped Data Mean

• The sum of the values of all observations divided by the total number of observations.

• Using the symbol

x

Mean, x = =

x n

2

n

3 n 25

Example 1

a) Find the mean of a set of numbers 3, 5, 7, 4, 5, 9, 6 b) Find the mean of a set of data Number of Male Children 0 1 2 3 4 5 Frequency 2 5 7 3 2 1 26

Solution 1(a) 27

Solution 1(b) 28

Median

• The middle value when a set of data is arranged in order of magnitude (in ascending or descending).

• For a set of data

x

1

,

x

2

,

x

3

,...,

x n

arranged in order of magnitude, there are two cases.

29

CASE 1:

data (n) is

odd

Median =  

n

 1 2  

th

CASE 2:

data (n) is

even

Median = Mean of the two middle values 30

Example 2

Find the median for the following set of data.

a) 180 186 191 201 209 219 220 b) 17 24 21 28 36 32 20 c) 3.56 2.71 5.48 8.61 4.35 6.22

31

Solution 2(a) 32

Solution 2(b) 33

Solution 2(c) 34

Mode

• The mode of a set of data is the value that occurs most frequently .

Example 3

Find the mode for the following set of data.

a) 5, 2, 3, 3, 5, 4, 28, 5 b) 2, 3, 5, 8, 10 c) 0.2, 0.4, 0.4, 0.4, 0.5, 0.7, 0.7, 0.7, 0.5

35

Example 4

Find the mode for the following data: x f 20 4 33 10 40 6 52 7 The higher frequency Solution 36

Grouped Data

Mean

For grouped data, the mean is given by x =   fx f • • f = the frequency for each class-value x = class mark Class mark = the mid-point for each class-value 37

Example 1

The table shows the distribution of the fat content of 40 pieces of food. Find the mean for the following distribution.

Fat content 0.1 - 1.0

1.1 - 2.0

2.1 - 3.0

3.1 - 4.0

4.1 - 5.0

Frequency 4 5 7 13 11 38

Fat content 0.1 - 1.0

1.1 - 2.0

2.1 - 3.0

3.1 - 4.0

4.1 - 5.0

Frequency, f 4 5 7 13 11 Solution 1 Class mark, x

mean,x

  

fx f

fx 39

Median

Median of frequency distribution for grouped data, can be estimated by using the formula      2 k-1 f k      C L k = lower boundary of class median n = number of data or the sum of frequency F k-1 = cumulative frequency before the median class f k = frequency of the median class C = class width 40

Example 2

Find the median given that the lengths of a sample of 90 pieces of leaves from a tree are recorded in the table (Figure 1): Lengths (cm) Frequency 4 – 5 2 6 – 7 8 – 9 10 – 11 12 – 13 14 – 15 Figure 1 6 14 31 30 7 41

Solution 2 42

Solution 2 Length (cm) 4 – 5 6 – 7 8 – 9 10 – 11 12 – 13 14 - 15 frequency 2 6 14 31 30 7 Cumulative frequency 43

Solution 2 44

Mode

 

d

1 2  

C

L B = lower class boundary of mode class d 1 d 2 = the different between mode class frequency class frequency.

C = class width 45

Example 3

The table below shows the distribution of the heights of 30 plants of type B which have been planted for 6 weeks. These heights are measured to the nearest cm. Estimate the mode of this distribution.

Heights (cm) 3 – 5 6 – 8 9 – 11 12 – 14 15 – 17 18 – 20 f 1 2 11 10 5 1

Mode class

46

Mode

L B

Solution 3 

d

1

d

1 

d

2  

C

47

Solution 3 48

Quartiles

k      4   f k k-1      C k k =1,2, 3 L k = lower class boundary of the class containing the quartile F k-1 = cumulative frequency before the class containing the quartile n = the number of data f k = frequency of the class containing the quartile C k

Example 4

The table shows the marks of 250 pre-university students in an examination.

Marks 0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 No. of students 15 20 25 24 12 31 71 52 Estimate the: a) First quartile b) Third quartile 50

Solution 4(a) 51

Solution 4(a) 52

Solution 4(b) 53

Solution 4(b) 54

Percentiles

Percentiles divide the data set into 100 equal parts.

The percentile can be obtained by the formula below: k            f k k-1       C k k =1, 2, 3, ..., 99 55

Where, L F k k-1 = lower class boundary of the class containing the percentile = cumulative frequency BEFORE the class containing the percentile n = the number of data f k C k = frequency of the class containing the percentile = class with of the class containing the percentile 56

NOTES!!

The 25 percentile is called the 1st quartile, Q 1

P =Q

1 The median is the 50 percentile, are also called the second quartile, Q 2 2 The 75 percentile is called the 3rd quartile, Q 3 3 57

Example 5

The following table shows the weekly pocket money of 50 students in a secondary school.

Pocket money (RM) f 20< x <25 25< x <30 30< x <35 35< x <40 40< x <45 10 15 16 5 4 Find the 40 th and 90 th percentiles respectively.

58

Pocket money 20< x <25 25< x <30 30< x <35 35< x <40 40< x <45 Solution 5 f 10 15 16 5 4 Cumulative frequency 59

Solution 5 60

Box and Whisker Plots

A box plot summarizes data using the median, quartiles, and the extreme (least and greatest) values. It used to provide a graphical display of the center and variation of a data set.

Construction of Box and Whisker Plots

Step 1 :

Arrange the data in order least to greatest

Step 2 :

Find median , quartiles , and the extreme (least and greatest) values

Step 3 :

Connect the quartiles to each other to make a box , and then connect the maximum box to the minimum and with lines . 62

Example 1

Draw a Box-and-Whisker Plots for the following set of data.

3, 5, 4, 2, 1, 6, 8, 11, 14, 13, 6, 9, 10, 7 63

Solution 1

Step 1: Arrangement of data

Arrange your numbers from the least to the greatest: 1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14 64

Step 2: Find median, quartile 1 and quartile 3

1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14 • Then find the median (from the ordered list): • Cross off one number from each side until you reach the middle number (or numbers).

1, 2, 3, 4, 5, 6,

6, 7,

8, 9, 10, 11, 13, 14

1, 2, 3, 4, 5, 6,

6, 7

, 8, 9, 10, 11, 13, 14 • If there are two numbers in the middle, Add those 2 middle numbers together: 6 + 7 = 13 • Then divide by 2: 13 ÷ 2 = 6.5

• The median is 6.5.

1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14 • Then split the numbers on left and right sides of the median: 1, 2, 3, 4, 5, 6, 6, │ 7, 8, 9, 10, 11, 13, 14

1, 2, 3, 4, 5, 6, 6, │ 7, 8, 9, 10, 11, 13, 14 • Find the median for each half: 1, 2, 3, 4, 5, 6, 6 │ 7, 8, 9, 10, 11, 13, 14 1, 2, 3,

4

, 5, 6, 6 │ 7, 8, 9,

10

, 11, 13, 14 Left Median = 4 Right Median = 10

1, 2, 3,

4

, 5, 6, 6 │ 7, 8, 9,

10

, 11, 13, 14 Left Right Median = 4 Median = 10 • The left median is called the LOWER QUARTILE, Q 1 .

• The right median is called the UPPER QUARTILE, Q 3 .

Step 3 : Connect the quartiles to each other to make a box, and then connect the box to the minimum and maximum with lines. 1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14 • Draw a number line from the smallest to the largest number without skipping any numbers.

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1, 2, 3,

4

, 5, 6, 6, 7, 8, 9,

10

, 11, 13, 14 • Put circles at the LOWER and UPPER Quartiles.

1 2 3 4 5 6 7 8 9 10 11 12 13 14

• Draw a box connecting the circles at the LOWER and UPPER Quartiles.

1 2 3 4 5 6 7 8 9 10 11 12 13 14

• Put a circle at the median (6.5).

• Draw a line connecting the median to the box.

1 2 3 4 5 6 7 8 9 10 11 12 13 14

• Put circles at the high and low points. • Draw lines that connect the high and low points to the box. 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Box and Whisker Plot

3, 5, 4, 2, 1, 6, 8, 11, 14, 13, 6, 9, 10, 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Here is the completed Box and Whisker Plot!

Symmetry and Skewness

1. Symmetrical distribution

: The ‘whiskers’ are the same length and the median is the centre of the box.

Q 1 Q 2 Q 3 76

2. Positively skewed distribution

: The left ‘whiskers’ is shorter than the right ‘whiskers’ and the median is nearer Q 1 .

Q 1 Q 2 Q 3 77

3. Negatively skewed distribution

: The left ‘whiskers’ is longer than the right ‘whiskers’ and the median is nearer Q 3 .

Q 1 Q 2 Q 3 78

6.3 – Measures of Dispersion

Learning outcomes: At the end of this topic, students should be able to : a) Find and interpret variance and standard deviation for ungrouped data.

b) 7 9 c) Find and interpret variance, standard deviation for grouped data.

Find and interpret the Pearson’s Coefficient of Skewness.

79

Data UNGROUPED DATA

variance and standard deviation

GROUPED DATA

variance and standard deviation

80

Variance and standard deviation for Ungrouped data.

 For ungrouped data ; Mean =  x n  x 2  ( n -1  n x ) 2 Standard deviation, s = s 2 81

Example 1

Find the mean, variance and standard deviation for the data below 2, 7, 10, 9, 2, 5, 16 Solution 1 82

Solution 1 83

Exercise

Find the mean and standard deviation of the set of numbers 5, 2, 3, 8, 6 Answer: Mean = 4.8

Standard deviation = 2.39

Example 2

A set of numbers {1,6,3,2,8,5,x, y} has mean of 4, variance of 36 .Show that x + y = 7 and hence find 7 the values of x and y.

Solution 2 85

Solution 2 86

Variance and standard deviation for Grouped data.

Variance, s 2   fx 2  ( n -1  n fx ) 2 with x = class midpoint f = frequency Standard deviation, s = s 2 87

Example 3

Find the mean, variance and standard deviation for the data below.

Marks f 9 29 42 26 14 88

Marks 0  20  40  60 80   x < 20 x < 40 x < 60 x < 80 x < 100

f

9 29 42 26 14

n

=120 Solution 3 Midpoint,

x f x fx

2 89

Solution 3 90

Pearson Coefficient of Skewness

The Pearson coefficient of skewness provides a numerical measure of the skewness of a distribution.

3(mean - median) standard deviation OR (mean - mode) standard deviation 91

3(mean - median) standard deviation = (mean - mode) standard deviation 3(mean-median) =mean-mode Note :  the distribution is positively skewed.

 the distribution is negatively skewed.

Example 4

Find the Pearson's coefficient of skewness.

1.2, 1.5, 1.9, 2.4, 2.4, 2.5, 2.6, 3.0, 3.5, 3.8

93

Solution 4 94

Solution 4 95

Example 5

In exam, the marks of 120 students is given as below  fx =3108  Mode =27.6

Find the mean, standard deviation and Pearson's coefficient of skewness for the distribution and interpret the result.

96

Solution 5

Solution 5 98

Exercise

The marks for 400 KMM students in the first quiz are given below

Marks

0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89

Number of students

44 56 64 78 60 40 36 18 4

Estimate the mean, median and standard deviation for the above sample. By calculating Person’s coefficient of skewness, state the type of distribution for the above data.

Answers: mean = 35.3

median = 34.1

standard deviation = 20.1

Person’s coefficient of skewness = 0.179

(skewed to the right)