Probability Probability Distributions

Download Report

Transcript Probability Probability Distributions

Estimation in Sampling

GTECH 201 Lecture 15

Conceptual Setting

   How do we come to conclusions from empirical evidence?

  Isn’t common sense enough?

Why?

Systematic methods for drawing conclusions from data  Statistical inference Inductive versus Deductive Reasoning

Drawing Conclusions

    Statistical inference  Based on the laws of probability  What would happen if?

  You ran your experiment hundreds of times You repeated your survey over and over again Statistic and Parameter The proportion of the population who are usually denoted by: p In a SRS of 1000 people, the proportion of the people who are usually

ˆ p -hat)

Estimating with Confidence

 Say you are conducting an opinion poll…       SRS of 1000 adult television viewers You ask these folks if they trust Walter Cronkite when he delivers the nightly news Out of 1000, 570 say, they trust him 57% of the people trust Walter

ˆ is 0.57

If you collect another set of 1000 television viewers, what will the rating be?

Confidence Statement

    We need to add a confidence statement We need to say something about the margin of error Confidence statements are based on the distribution of the values of the sample population

independent SRS were taken from the same The sampling distribution of the statistic

Terminology Review

     Sample Population Statistic  a numerical characteristic associated with a sample Parameter  A numerical characteristic associated with the population Sampling error  The need for interval estimation

Point Estimation

 Point estimation of a parameter is the value of a statistic that is used to estimate the parameter    Compute statistic (e.g., mean) Use it to estimate corresponding population parameter Point Estimators of Population Parameters (see next slide)

Point Estimators for Population Parameters

Population Sample Calculating Parameter statistic formula   

T s x N i i

 

 1 (

i X i

  (  1 )

n x i n



x i n

 

N x

1 ) 2 

n X i



p x n

Interval Estimation

 Sample point estimators are usually not absolutely precise  How close or how distant is the calculated sample statistic from the population parameter   We can say that the sample statistic is within a certain range or interval of the population parameter.

The determination of this range is the basis for interval estimation

Interval Estimation (2)

  A confidence interval (CI) represents the level of precision associated with a population estimate Width of the interval is determined by  Sample size,   variability of the population, and the probability level or the level of confidence selected

Sampling Distribution of the Mean

   The distribution of all possible sample means for a sample of a given size Use the mean of a sample to estimate and draw conclusions about the mean of that entire population So we have samples of a particular size  We need formulas to determine the mean and the standard deviation of all possible sample means for samples of a given size from a population

Sample and Population Mean

 For samples of size variable

n , mean of the  Is equal to the mean of the variable under consideration  Mean of all possible sample means is equal to the population mean 

 

Sample Standard Deviation

 For samples of size n , the standard deviation of the variable

 Is equal to the standard deviation of the variable under consideration, divided by the square root of the sample size  For each sample size, the standard deviation of all possible sample means equals the square root of the sample size 

 

Central Limit Theorem

  Suppose all possible random samples of size are drawn from an infinitely large, normally standard deviation   n The frequency distribution of these sample means will have:  A mean of  (the population mean)   A standard deviation of 



Sampling Error

   Standard Error of the mean (SEM) is a basic measure for the amount of sampling error  



SEM indicates how much a typical sample mean is likely to differ from a true population mean Sample size, and population standard deviation affect the sampling error

Sampling Error (2)

  The larger the sample size, the smaller the amount of sampling error The larger the standard deviation, the greater the amount of sampling error Large  ) St an da rd de via tio n o f p Sa op m ula ple tio siz n ( e (

) Small Small Large

Finite Population Correction Factor

   The frequency distribution of the sample means is approximately normal if the sample size is large N < 30 (small sample); N > 30 (large sample) If you have a finite population, then you need to introduce a correction, i.e., the fpc rule/factor in the estimation process  where fpc

fpc



N N

 

1 = finite population correction;   n = sample size; N = population size

Standard Error of the Mean for Finite Populations

When including the fpc 

  should be:

(

fpc

) In general, you include the fpc in the population estimates only when the ratio of sample size to population size exceeds 5 % or when n / N > 0.05

Constructing Confidence Intervals

   A random sample of 50 commuters reveals that their average journey-to-work distance was 9.6 miles A recent study has determined that the std. deviation of journey-to-work distance is approximately 3 miles What is the CI around this sample mean of 9.6 that guarantees with 90 % certainty that the true population mean is enclosed within that interval?

Confidence Interval for the Mean

   

  9 .

6  3

 50 ( Z value associated with a 90 % confidence level Z =1.65) The sample mean is the best estimate of the true population mean CI =



 

50 = 10.30 miles  = 8.90 miles

Confidence Interval

    We say that the sample statistic is within a certain range or interval of the population parameter  e.g., in our sample, 57% of the viewers thought Walter Cronkite is trustworthy In the general population, between 54% and 60% of viewers think that Walter Cronkite is trustworthy Or, in our sample, the average commuting distance was 9.6 miles In the population, we calculated that the average commute is likely to be somewhere between 8.9 miles and 10.3 miles

Confidence Level

    Gives you an understanding of how reliable your previous statement regarding the confidence interval is The probability that the interval actually includes the population parameter For example, the confidence level refers to the probability that the interval (8.9 miles to 10.3 miles) actually encompasses the TRUE population mean (90%, 95%, 99.7%) Confidence Level probability is 1 

Significance Level

     

(alpha)

The probability that the interval that surrounds the sample statistic DOES NOT include the population parameter E.g., the probability that the average commuting distance does not fall between 8.9 miles and 10.3 miles  = 0.10 (90%); 0.05 (95%); 0.01 (99.7%) Confidence Interval width -- increases

Sampling Error

   Total sampling error =  Probability that the sample statistic will fall into either tail of the distribution is:  /2 If you want 99.7% confidence (i.e., low error), then you have to settle for giving a less precise estimate (the CI is wider)

If the Standard Deviation is Unknown

    If we don’t know the population mean, its likely we don’t know the standard deviation What you are likely to have is the variance and standard deviation of your sample Also, you have a small population, so you have to use the finite population correction factor that was discussed earlier Once you have the formula for standard error, then you can proceed as before to determine the confidence interval

Standard Error



 

(

fpc

)

fpc

 



n N n s

2 



N N



 1







Student’s T Distribution

 William Gosset (1876-1937)   Published his contributions to statistical theory under a pseudonym Student’s t distribution is used in performing inferences for a population mean, when,  The population being sampled is approximately normally distributed   The population standard deviation is unknown And the sample size is small ( n < 30)

Characteristics of the t - Distribution

     A t curve is symmetric, bell shaped Exact shape of distribution varies with sample size When n nears 30, the value of standard normal Z value t approaches the A particular distribution is identified by defining its degrees of freedom (df) For a t distribution, df = ( n -1)







Properties of t Curves

     The total area under a t curve = 1 A t curve extends indefinitely in both directions, approaching, but never touching the horizontal axis A t-curve is symmetrical about 0 As the degrees of freedom become larger, t curves look increasingly like the standard normal curve We need to use a t-table and look for values of t, instead of Z to determine the confidence interval

Calculating various CIs

   Sampling  SRS, systematic, or stratified Parameters  Mean, total, or proportion Six situations   Consider whether to use fpc  when n/N > 0.05

Consider whether to use Z  when n < 30 or t

If Random or Systematic Sample

  Estimate of Population Mean  Best estimate is ?

Estimate of sampling error  Standard error of the mean (inc. fpc) 



n N n CI







If Stratified Sample

 Estimate of population mean  Still equal to sample mean but…

 1

N i i

 

 1

N i X i Where m=number of strata; i= refers to a particular stratum

 Std. Error of the mean (inc. fpc) 

 1

i i

 

 1

N i

2  

s i

n i

   

N i



N i n i

 

Minimum Sample Size

   Before going out to the field, you want to know how big the sample ought to be for your research problem Sample must be large enough to achieve precision and CI width that you desire Formulas to determine the three basic population parameters with random sampling

Sample Size Selection - Mean

 Your goal is to determine the minimum sample size







 You want to situate the estimated population mean, in a specified CI E = amount of error you are willing to tolerate

 

Z Z

 

n x





Example 1

 We are looking at Neighborhood X   3,500 households Sample size = 25 households     Sample mean = 2.73

Sample variance = 2.6

CI = 90% Find the mean number of people per household

Example 2

   Sample of 30 households Sample standard deviation is 1.25

What sample size is needed to estimate the mean number of persons per household in neighborhood X  and be 90% confident that your estimate will be within 0.3 persons of the true population mean?

Probability Probability Distributions

Transcript Probability Probability Distributions

Estimation in Sampling

Conceptual Setting

Drawing Conclusions

Estimating with Confidence

Confidence Statement

Terminology Review

Point Estimation

Point Estimators for Population Parameters

Interval Estimation

Interval Estimation (2)

Sampling Distribution of the Mean

Sample and Population Mean

Sample Standard Deviation

Central Limit Theorem

Sampling Error

Sampling Error (2)

Finite Population Correction Factor

Standard Error of the Mean for Finite Populations

Constructing Confidence Intervals

Confidence Interval for the Mean

Confidence Interval

Confidence Level

Significance Level

(alpha)

Sampling Error

If the Standard Deviation is Unknown

Standard Error

Student’s T Distribution

Characteristics of the t - Distribution

Properties of t Curves

Calculating various CIs

If Random or Systematic Sample

If Stratified Sample

Minimum Sample Size

Sample Size Selection - Mean

Example 1

Example 2

Directory