Numerical Measures - De La Salle University

Download Report

Transcript Numerical Measures - De La Salle University

Fundamental Sampling Distributions
and Data Descriptions
ENGSTAT Notes of AM Fillone, De La Salle
University-Manila
Population
– the totality of observations with which we are concerned, whether their number be
finite or infinite
-Statisticians uses the term to refer to observations relevant to anything of interest,
whether it be groups of people, animals, or all possible outcomes from some
complicated biological or engineering system
Definition 8.1 A population consists of the totality of the observations with which we
are concerned.
Definition 8.2 A sample is a subset of a population.
Definition 8.3 Let X1, X2, …, Xn be n independent random variables, each having the
same probability distribution f(x). Define X1, X2, …, Xn to be a random sample of size n
from the population f(x) and write its joint probability distribution as
f(x1, x2, …, xn) = f(x1)f(x2) … f(xn)
ENGSTAT Notes of AM Fillone, De La Salle
University-Manila
Some Important Statistics
Definition 8.4: Any function of the random variables constituting a random sample is
called a statistic.
Definition 8.5: If X1, X2, …, Xn represent a random sample of size n, then the sample
mean is defined by the statistic.
Definition 8.6: If X1, X2, …, Xn represent a random sample of size n, then the sample
variance is defined by the statistic
Theorem 8.1: If S2 is the variance of a random sample of size n, we may write
Definition 8.7: The sample standard deviation, denoted by S, is the positive square
root of the sample variance.
ENGSTAT Notes of AM Fillone, De La Salle
University-Manila
Other Statistics
The sample median – reflects the central tendency of the sample in such a way that it
is uninfluenced by extreme values or outliers.
Given that the observations in a sample are x1, x2, …, xn, arranged in increasing order
of magnitude, the sample median is
ENGSTAT Notes of AM Fillone, De La Salle
University-Manila
Example: Mean, median, mode, and standard deviation
According to ecology writer Jacqueline Killeen, phosphates contained in household
detergents pass right through our sewer systems, causing lakes to turn into swamps that
eventually dry up into deserts. The following data show the amount of phosphates per
load of laundry, in grams, for a random sample of various types of detergents used
according to the prescribed directions:
Laundry Detergent
A & P Blue Sail
Dash
Concentrated All
Cold Water All
Breeze
Oxydol
Ajax
Sears
Fab
Cold Power
Bold
Rinso
Phosphates per Load
(grams)
48
47
42
42
41
34
31
30
29
29
29
26
For the given phosphate data, find: (a) the mean; (b) the median; (c) the mode; and
(d) the standard deviation.
ENGSTAT Notes of AM Fillone, De La Salle
University-Manila
Solution:
(a)
(b) Arrange data in increasing order - 26, 29, 29, 29, 30, 31, 34, 41, 42, 42, 47, 48
= (1/2)(31+34) = 32.5 grams
(c) Mode = 29
(d) Standard deviation,
ENGSTAT Notes of AM Fillone, De La Salle
University-Manila
Data Displays and Graphical Methods
Box-and-Whisker Plot or Box Plot
• This plot encloses the interquartile range of the data in a box that has the
median displayed within
• The interquartile range has its extremes, the 75th percentile (upper quartile)
and the 25th percentile (lower quartile)
• “Whiskers” extend showing extreme observations in the sample
• A variation called a box plot can provide the viewer information regarding
which observations may be outliers
• Outliers are observations that are considered to be unusually far from the bulk
of the data
ENGSTAT Notes of AM Fillone, De La Salle
University-Manila
Example: Consider the data in Table 8.1 about the nicotine content in a random sample of
40 cigarettes. Develop a box-and-whisker plot of the data.
ENGSTAT Notes of AM Fillone, De La Salle
University-Manila
Example: Constructing a Stem-and-Leaf Plot
Consider the data of Table 1.4, which specifies the “life” of 40 similar car batteries
recorded to the nearest tenth of a year. The batteries are guaranteed to last 3 years.
Table 1.4: Car Battery Life
2.2
4.1
3.5
3.4
1.6
3.1
2.5
4.3
3.4
3.3
3.1
3.7
4.7
3.8
3.2
4.5
3.3
3.6
4.4
2.6
3.2
3.8
2.9
3.2
3.9
3.7
3.1
3.3
4.1
3.0
3.0
4.7
3.9
1.9
4.2
2.6
3.7
3.1
3.4
3.5
Process:
1. Split each observation into two parts consisting of a stem and a leaf such that the
stem represents the digit preceding the decimal and the leaf corresponds to the
decimal part of the number.
2. For example, for number 3.7, the digit 3 is designated the stem and the digit 7 is
the leaf.
3. The four stems 1, 2, 3, and 4 are listed vertically on the left side in Table 1.5; the
leaves are recorded on the right side opposite the appropriate stem value.
Table 1.5:
Steam-andLeaf Plot
Stem
1
2
3
4
Leaf
Frequency
69
2
25669
5
0011112223334445567778899
25
11234577
8
ENGSTAT Notes of AM Fillone, De La Salle
University-Manila
Stem-and-Leaf Plot
1. The stem-and-leaf plot of Table 1.5 contains only four stems and consequently
does not provide an adequate picture of the distribution.
2. To remedy the problem, the number of stems could be increased.
3. One way of doing this is to increase the number of stems of the plot.
4. One way to accomplish this is to write each stem value twice and then record the
leaves 0, 1, 2, 3, and 4 opposite the appropriate stem value where it appears for
the first time; and the leaves 5, 6, 7, 8, 9 opposite this same stem value where it
appears for the second time
Table 1.6: Double-Stem-and-Leaf Plot of Battery Life
Stem
1
2*
2
3*
3
4*
4
Leaf
69
2
5669
001111222333444
5567778899
11234
577
Frequency
2
1
4
15
10
5
3
ENGSTAT Notes of AM Fillone, De La Salle
University-Manila
Frequency Distribution
-The data are grouped into different classes or intervals and can be constructed by
counting the leaves belonging to each stem and noting that each stem defines a class
interval.
- A table listing relative frequencies is called a relative frequency distribution.
- The relative frequency distribution of Battery Life is given in Table 1.7 below.
Table 1.7: Relative Frequency Distribution of Battery Life
Class Interval
1.5-1.9
2.0-2.4
2.5-2.9
3.0-3.4
3.5-3.9
4.0-4.4
4.5-4.9
Class Midpoint
1.7
2.2
2.7
3.2
3.7
4.2
4.7
Frequency, f
2
1
4
15
10
5
3
Relative
Frequency
0.050
0.025
0.100
0.375
0.250
0.125
0.075
1.000
0.4
Figure 1.6: Relative frequency
histogram
Relatvie Frequency
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
ENGSTAT Notes of AM Fillone, De La Salle University-Manila
1.7
2.2
2.7
3.2
3.7
Battery Life (years)
4.2
4.7
Quantile Plot
Definition 8.8. A quantile of a sample, q(f), is a value for which a specified fraction f of the
data values is less than or equal to q(f).
Detection of Deviations from Normality: Normal Quantile-Quantile Plot
Definition 8.9: The normal quantile-quantile plot is a plot of y(i) (ordered observations)
against q0,1(fi), where fi = (i – 3/8)/(n + ¼).
- where a good approximation of the quantile for the N(0,1) random variable is
ENGSTAT Notes of AM Fillone, De La Salle University-Manila
Sampling Distributions
Definition 8.10: The probability distribution of a statistic is called a sampling distribution.
Sampling Distribution of
Theorem 8.2: Central Limit Theorem: If X is the mean of a random sample of size n
taken from a population with mean  and finite variance 2, then the limiting form of
the distribution of
As n
, is the standard normal distribution n(z;0,1).
Sampling Distribution of the Difference between Two Averages
Theorem 8.3: If independent samples of size n1 and n2 are drawn at random from two
populations, discrete or continuous, with means 1 and 2, and variances 21 and 22 ,
respectively, then the sampling distribution of the differences of means,
, is
approximately normally distributed with mean and variance given by
and
ENGSTAT Notes of AM Fillone, De La Salle
University-Manila
Hence,
is approximately a standard normal variable.
Sampling Distribution of S2
Theorem 8.4: If S2 is the variance of a random sample of size n taken from a normal
population having the variance 2, then the statistic
has a chi-squared distribution with  = n – 1 degrees of freedom.
ENGSTAT Notes of AM Fillone, De La Salle
University-Manila
Degrees of Freedom
When  is not known and one considers the distribution of
There is 1 less degree of freedom, or a degree of freedom is lost
in the estimation of  (i.e., when  is replaced byx)
-In other words, there are n degrees of freedom or independent
pieces of information in the random sample from the normal
distribution.
- When the data (the values in the sample) are used to compute
the mean, there is 1 less degree of freedom in the information
used to estimate 2.
ENGSTAT Notes of AM Fillone, De La Salle
University-Manila
Examples: Chi-squared Distribution
Ex. For the chi-squared distribution find
1.
Answer: 27.488 (Table A.5)
2.
Answer: 18.475
3.
Answer: 36.415
ENGSTAT Notes of AM Fillone, De La Salle
University-Manila
t- Distribution
Theorem 8.5: Let Z be a standard normal random variable and V a chi-squared random
variable with  degrees of freedom. If Z and V are independent, then the distribution of
the random variable T, where
is given by the density function
This is known as the t-distribution with  degrees if freedom.
ENGSTAT Notes of AM Fillone, De La Salle
University-Manila
Shape of t-Distribution
• The distribution of T is similar to the distribution of Z in that they both are symmetric
about the mean zero.
• Both distributions are bell shaped, but the t-distribution is more variable, owing to
the fact that the T-values depend on the fluctuations of two quantities,X and S2,
whereas the Z-values depend only on the changes ofX from sample to sample.
• This distribution of T differs from that of Z in that the variance of T depends on the
sample size n and is always greater than 1.
• Only when the sample size n
 will the two distributions become the same.
0
Figure 8.14: Symmetry property of the t-distribution
ENGSTAT Notes of AM Fillone, De La Salle University-Manila
Example: t - Distribution
ENGSTAT Notes of AM Fillone, De La Salle
University-Manila
Solution:
From t-distribution table, Table A.4
Hence, the claim is supported by the
data obtained since T value is inside the
–t0.025 and t0.025.
ENGSTAT Notes of AM Fillone, De La Salle University-Manila
Corollary 8.1: Let X1, X2, …, Xn be independent random variables that are all normal with
mean  and standard deviation . Let
and
Then the random variable
has a t-distribution with  = n – 1 degrees of freedom.
F-Distribution
Theorem 8.6: Let U and V be two independent random variables having chi-squared
distributions with 1 and 2 degrees of freedom, respectively. Then the distribution of
the random variable F = (U/v1)/(V/v2) is given by the density
This is known at the F-distribution with 1 and 2 degrees of freedom (d.f.).
ENGSTAT Notes of AM Fillone, De La Salle University-Manila
with 1 and 2 degrees of freedom, we obtain
Theorem 8.7: Writing
Theorem 8.8: If S21 and S22 are the variances of independent random samples of
size n1 and n2 taken from normal populations with variances 21 and 22,
respectively, then
This is known as the F-distribution with 1 = n1 -1 and 2 = n2 -1 degrees of freedom.
Use of the F-Distribution
The F-Distribution is used in two-sample situations to draw inferences about the
population variances.
The F-distribution is called the variance ratio distribution.
ENGSTAT Notes of AM Fillone, De La Salle
University-Manila
Solution:
(a)2.71
(b)2.92
(c)0.345
ENGSTAT Notes of AM Fillone, De La Salle
University-Manila
Solution:
ENGSTAT Notes of AM Fillone, De La Salle University-Manila