Transcript Document

Lesson 1 - 2 Describing Distributions with Numbers

parts from Mr. Molesky’s Statmonkey website

Knowledge Objectives

What is meant by a resistant measure?

Two reasons why we use squared deviations rather just average deviations from the mean

What is meant by degrees of freedom”

Construction Objectives

Identify situations in which the mean is the most appropriate measure of center and situations in which the median is the most appropriate measure

Given a data set:

– – – – – –

Find the quartiles Find the five-number summary Compute the mean and median as measures of center Compute the interquartile range (IQR) Use the 1.5

IQR rule to identify outliers Compute the standard deviation and variance as measures of spread

Construction Objectives cont

Identify situations in which the standard deviation is the most appropriate measure of spread and situations in which the interquartile range is the most appropriate measure

Explain the effect of a linear transformation of a data set on the mean, median, and standard deviation of the set

Use numerical and graphical techniques to compare two or more data sets

Vocabulary

• • • • • • • •

Mean – the average value Median – the middle value (in an ordered list) Resistant measure – a measure (statistic or parameter) that is not sensitive to the influence of extreme observations Mode – the most frequent data value Range – difference between the largest and smallest observations P th percentile – p percent of the observations(in an ordered list) fall below at or below this number Quartile – multiples of 25 th median; Q3 – 75 th ) percentile (Q1 – 25 th ; Q2 –50 th or Five number summary – the minimum, Q1, Median, Q3, maximum

Vocabulary cont

• • • • • • •

Boxplot – graphs the five number summary and any outliers Interquartile range (IQR) – where IQR = Q3 – Q1 Outlier 1.5

– a data value that lies outside the interval [Q1 –

IQR, Q3 + 1.5

IQR] Variance – the average of the squares of the deviations from the mean Standard Deviation – the square toot of the variance Degrees of freedom – the number of independent pieces of information that are included in your measurement Linear transformation – changes the data in the form of x new + bx = a

Measures of Center

Numerical descriptions of distributions begin with a measure of its “center” If you could summarize the data with one number, what would it be?

 

Mean:

x x

x

1 

x

2  ...

x n n x

 

x i n

Median: The “middle” value of an ordered dataset Arrange observations in order min to max



.

Mean vs Median

The mean and the median are the most common measures of center If a distribution is perfectly symmetric, the mean and the median are the same The mean

is not resistant

to outliers The mode, the data value that occurs the most often, is a common measure of center for categorical data You must decide which number is the most appropriate description of the center...

Mean Median Applet http://bcs.whfreeman.com/tps3e/content/cat_020/applets/ meanmedian.html

Use the mean on symmetric data and the median on skewed data or data with outliers

Distributions Parameters

Median Mean Mode

Mean < Median < Mode

Skewed Left : (tail to the left)

Mean substantially smaller than median (tail pulls mean toward it)

Distributions Parameters

Mode Median Mean

Mean ≈ Median ≈ Mode

Symmetric :

Mean roughly equal to median

Distributions Parameters

Median Mode Mean

Mean > Median > Mode

Skewed Right : (tail to the right)

Mean substantially greater than median (tail pulls mean toward it)

Central Measures Comparisons

Measure of Central Tendency Mean Median Mode Computation μ = (∑x i ) / N x‾ = (∑x i ) / n Arrange data in ascending order and divide the data set into half Divides into bottom 50% and top 50% Tally data to determine most frequent observation Interpretation Center of gravity Most frequent observation When to use Data are quantitative and frequency distribution is roughly symmetric Data are quantitative and frequency distribution is skewed Data are categorical or the most frequent observation is the desired measure of central tendency

Example 1 Which of the following measures of central tendency resistant?

1. Mean

Not resistant

2. Median 3. Mode

Resistant Resistant

Example 2

Given the following set of data: 70, 56, 48, 48, 53, 52, 66, 48, 36, 49, 28, 35, 58, 62, 45, 60, 38, 73, 45, 51, 56, 51, 46, 39, 56, 32, 44, 60, 51, 44, 63, 50, 46, 69, 53, 70, 33, 54, 55, 52 What is the mean?

51.125

What is the median?

51 What is the mode?

48, 51, 56 What is the shape of the distribution?

Symmetric (tri-modal)

Example 3

Given the following types of data and sample sizes, list the measure of central tendency you would use and explain why?

Hair color Height Weight Parent’s Income Number of Siblings Age Sample of 50 Sample of 200 mode mode mean mean median mean mean mean mean median mean mean Does sample size affect your decision?

Not in this case, but the larger the sample size, might allow use to use the mean vs the median

Sample Data

Consider the following test scores for a small class: 75 76 82 93 45 68 74 82 91 98 Plot the data and describe the SOCS:

S

hape?

O

utliers?

C

enter?

S

pread?

What number best describes the “center”?

What number best describes the “spread’?

Day 1 Summary and Homework

Summary

Three characteristics must be used to describe

– – – –

distributions (from histograms or similar charts)

Shape (uniform, symmetric, bi-modal, etc)

• •

Center (mean, median, mode measures) Spread (variance – next lesson) Median is resistant to outliers; mean is not!

Use Mean for symmetric data Use Median for skewed data (or data with outliers) Use Mode for categorical data

Homework

pg 74 – 75: problems 27-31