Stats Review - Orem High School

Download Report

Transcript Stats Review - Orem High School

Stats Review
Chapter 1
1.1 Displaying Distributions
Definitions:
Individuals – objects described by a set of data
Variable – any characteristic of an individual
Categorical – places an individual into a
group
Quantitative – numerical data about the
individual.
Examining a Distribution
 Look for the overall pattern and for deviations
from that pattern
 Describe using shape (symmetric, skewed),
center (median, mode, mean) and spread
(variation, standard deviation, IQR)
 Look for outliers and skewness
Stemplots:
1. Separate the data into different classes
2. Write the stems in a vertical column
3. Write each leaf as a single digit to the side of
each stem
Histograms:
1. Separate the data into different classes of
equal width
2. Write the classes along the horizontal axis
3. Write the relative frequency (count or
percentage) along the vertical axis
4. Create a bar for each class, with no space
between
Time plots:
1. Write the time or order along the horizontal
axis
2. Write the count along the vertical axis
3. Plot each observations value in the order
they occurred
1.2 Number Summaries
Measures of Center:
Mean – Average. Susceptible to influence by
outliers and skewness
Median – The middle value (the average of the
middle two if n is even). Not greatly affected
by outliers.
Quartiles:
1.
Arrange the data in increasing order and
locate the median M.
2. The first quartile Q1 is the median of the
observations that lie below the median M.
3. The third quartile Q3 is the median of the
observations that lie above the median M.
Interquartile Range (IQR)
The interquartile range is the distance between
the first and third quartiles.
IQR=Q3 – Q1
1.5 x IQR Criterion for Outliers
Call an observation a suspected outlier if it falls
more than 1.5 x IQR above Q3 or below Q1.
 Observations below Q1 – (1.5 x IQR)
 Observations above Q3 + (1.5 x IQR)
are considered possible outliers
5 Number Summaries
Minimum, Q1, Median, Q3, Maximum
Boxplots
SUSPECTEDOUTLIERS (1.5 X IRQ RULE)
MAXIMUM NON OUTLIER
THIRD QUARTILE
MEDIAN
FIRST QUARTILE
MINIMUM NON OUTLIER
Variance
The average of the squares of the differences between
the observation and the mean.
FORMULA:
Standard Deviation s
The square root of the variance.
FORMULA:
Properties of the Standard Deviation
 s is a measure of spread about the mean
 Only use when mean is measure of center
 s=0 implies that there is no spread and all
observations are the same value
 s is not resistant and will become very large
when there are a few outliers
Linear Transformations
 Multiplying each observation by a positive
number b, multiplies the mean, median, IRQ
and standard deviation by b.
 Adding the same number a to each
observation, adds a to mean and median but
does not change IRQ or standard deviation.
1.3 Normal Distributions
Strategies For Exploring Quantitative Data
1. Always plot your data (usually a stemplot or
histogram).
2. Look for overall pattern and for striking
deviations such as outliers.
3. Calculate a numerical summary to briefly
describe center and spread (5 number,
mean & standard deviation).
Density Curves
 Always on or above the horizontal axis
 Area under the curve always equals one
Skew
 Skew refers to the tail not the bump
 The mean (balance point) is always closer to
the tail than the median (cuts area in half).
Standard Deviation of Normal Density curves
 Points of inflection on the normal density
curve lie 1 σ away from the mean on each
side
68-95-99.7 Rule
 68% of observations fall with in σ of the mean
μ.
 95% of observations fall with in 2σ of the
mean μ.
 99.7% of observations fall with in 3σ of the
mean μ.
Z-score
Normal distributions can be standardized by the
following formula:
Normal Quartile Plots
Used to assess the normality of a distribution






Arrange data from smallest to largest. Record what percentile of the
data each value occupies. Example, the smallest observation of a
set of 20 is at the 5% point.
Find the z-score from Table A that corresponds to each percentile.
Example, z=-1.645 for the 5% point
Plot each data point x against its corresponding z-score.
If the plotted points lie close to a straight line then the distribution is
approximately normal.
If the line bends up at the right, then skewed right. If bends down on
the left, then the distribution is skewed left.
Outliers appear as points for away from the overall pattern of points
Review Exercises:
1.106, 1.114, 1.116, 1-119, 1.123