Introduction to Hypothesis Testing

Download Report

Transcript Introduction to Hypothesis Testing

Bivariate Statistics

GTECH 201 Lecture 17

Overview of Today’s Topic

 Two-Sample Difference of Means Test  Matched Pairs (Dependent Sample) Tests  Chi-Square Goodness of Fit Test  Kolmogorov-Smirnov Test

Differences Between Two Samples

   Are there significant differences between the two samples?

If sample differences are significant, then  Were can infer that the samples were drawn from truly different populations or vise versa Extending hypothesis testing   Statistic (mean) Relationship between samples   Independent Dependent

Two Sample Difference Of Means

Z

X

 1

X

 1 

X X

2 2  Numerator  Actual difference between sample means  Denominator  Standard error of the difference of the means (a measure of expected sampling error)

Pooled Variance/Separate Variance

    If the population variance is equal, use PV If the population variances are unknown but assumed equal, then use modified formula If population variances are assumed to be unequal, then, use SV Sample variances are considered best estimators of population variances

Matched Pairs (Dependent Sample)

 One set of observations (units)  Same location and/or same individuals   One variable, two time periods Two variables, one time period  Absence of two independent samples  When two sets of data are collected for one group of observations,samples are  Dependent    Matched pairs difference test (is the appropriate inferential test) Each unit in the sample has two values (a matched pair) Parametric, non parametric

Wilcoxon Matched-pairs Signed Ranks Test

 Random sample  Ordinal or downgraded to ordinal   H 0 : ranked matched-pair differences are equal  (  Test statistic

W

n T

(

n

n n

1 )( 4 2

n

1 )  1 ) 24 with T = rank sum

Matched Pairs

t

Test

 Sample are independent of each other  In this situation, the t-test considers the difference between the values for each matched pair  The greater the difference (d), the more dissimilar the results of the two values within the matched pair 

d

matched pairs in the sample

Wilcoxon Rank Sum

W

 Non-parametric difference of means test  Measures magnitude of the differences in ranked positions

W

  

ranks i

ranks i

 

ranks

Goodness of Fit Tests

 Comparing an actual or observed frequency distribution to some expected frequency distribution  Used to test the hypothesis that a a set of data has a particular frequency distribution  Confirm or deny the relevance or validity of a particular theory  Verify assumptions about samples

Chi Squared Distributions

 The total area under a chi-squared curve is equal to 1  Chi-squared curve starts at 0 on the horizontal axis and extends indefinitely to the right, approaching but not touching the horizontal axis  Chi squared curve is right skewed  As the number of degrees of freedom become larger, chi-squared curves look increasingly like normal curves

c

2

Function

 The above animation shows the shape of the Chi-square distribution as the degrees of freedom increase (1, 2, 5, 10, 25 and 50 )

Goodness of Fit Tests

  Characteristics of the expected frequency distribution    Uniform or equal Proportional or unequal Normal (theoretical) The chi-squared statistic compares   Observed frequency counts of a single variable (organized into nominal or ordinal categories) An expected distribution of frequency counts organized in the same categories

Rules for Using

c

2

Test

      Samples must be taken at random Variables must be organized in nominal or ordinal categories Must use absolute frequency counts  Cannot be applied if the observations or sampling units are relative frequencies such as percentages, proportions, or rates If there are 2 nominal/ordinal categories, then  both expected frequency counts must be at least five If there are 3 or more nominal/ordinal categories, then  No expected frequency should be less than two  At the most, only one-fifth of the frequency counts can be less than five This may be a reason to combine or reorganize categories

Test Statistic

c 2 

i k

  1 

O i

E i

 2

E i

Where, i =1, 2, to k (i.e., the different categories) O is the observed frequency in a particular category E is the expected frequency in that same category k is the total number of categories

Interpreting the value of

c 2  Null and Alternative hypothesis  Chi squared value is small, i.e., if the observed and expected frequencies are similar, then the goodness of fit is strong,  Do not reject the null hypothesis  Vice versa

Kolmogorov-Smirnov

      Goodness of fit test Uses data in ordinal categories, or interval/ratio data downgraded to ordinal categories Population is continuously distributed Null and Alternative hypothesis Cumulative relative frequencies are compared with cumulative frequencies expected for a normal distribution K-S test statistic (D) is the maximum absolute difference between two sets of cumulative values