IS 4800 Empirical Research Methods for Information Science Class Notes March 16, 2012 Instructor: Prof.
Download ReportTranscript IS 4800 Empirical Research Methods for Information Science Class Notes March 16, 2012 Instructor: Prof.
IS 4800 Empirical Research Methods for Information Science Class Notes March 16, 2012 Instructor: Prof. Carole Hafner, 446 WVH [email protected] Tel: 617-373-5116 Course Web site: www.ccs.neu.edu/course/is4800sp12/ Outline • • • • • Sampling and statistics (cont.) T test for paired samples T test for independent means Analysis of Variance Two way analysis of Variance Relationship Between Population and Samples When a Treatment Had No Effect Population Sample 1 M1 Sample 2 M2 3 Relationship Between Population and Samples When a Treatment Had An Effect Control group population c Treatment group population t Treatment group sample Control group sample Mc Mt 4 Sampling Mean? Variance? Population Sample of size N Mean values from all possible samples of size N aka “distribution of means” 2 X M= SD2 = 2 ( X M ) N MM = M2 = N 2 N ZM = ( M - ) / M Z tests and t-tests t is like Z: Z=M-μ/ M t = M – μ / SM μ = 0 for paired samples We use a stricter criterion (t) instead of Z because S M is based on an estimate of the population variance while M is based on a known population variance. S2 = Σ (X - M)2 = N–1 SS N-1 S2M = S2/N T-test with paired samples Given info about population of change scores and the sample size we will be using (N) We can compute the distribution of means ? =0 S2 est 2 from sample = SS/df Now, given a particular sample of change scores of size N S2M = S2/N We compute its mean and finally determine the probability that this mean occurred by chance t= M SM df = N-1 t test for independent samples Given two samples Estimate population variances (assume same) Estimate variances of distributions of means Estimate variance of differences between means (mean = 0) This is now your comparison distribution Estimating the Population Variance S2 is an estimate of σ2 S2 = SS/(N-1) for one sample (take sq root for S) For two independent samples – “pooled estimate”: S2 = df1/dfTotal * S12 + df2/dfTotal * S22 dfTotal = df1 + df2 = (N1 -1) + (N2 – 1) From this calculate variance of sample means: S2M = S2/N needed to compute t statistic S2difference = S2Pooled / N1 + S2Pooled / N2 t test for independent samples, continued Distribution of differences between means This is your comparison distribution NOT normal, is a ‘t’ distribution Shape changes depending on df df = (N1 – 1) + (N2 – 1) Compute t = (M1-M2)/SDifference Determine if beyond cutoff score for test parameters (df,sig, tails) from lookup table. ANOVA: When to use • Categorial IV numerical DV (same as t-test) • HOWEVER: – There are more than 2 levels of IV so: – (M1 – M2) / Sm won’t work ANOVA Assumptions • Populations are normal • Populations have equal variances • More or less.. 12 Basic Logic of ANOVA • Null hypothesis – Means of all groups are equal. • Test: do the means differ more than expected give the null hypothesis? • Terminology – Group = Condition = Cell 13 Accompanying Statistics • Experimental – Between-subjects • Single factor, N-level (for N>2) – One-way Analysis of Variance (ANOVA) • Two factor, two-level (or more!) – Factorial Analysis of Variance – AKA N-way Analysis of Variance (for N IVs) – AKA N-factor ANOVA – Within-subjects • Repeated-measures ANOVA (not discussed) – AKA within-subjects ANOVA 14 ANOVA: Single factor, N-level (for N>2) • The Analysis of Variance is used when you have more than two groups in an experiment – The F-ratio is the statistic computed in an Analysis of Variance and is compared to critical values of F – The analysis of variance may be used with unequal sample size (weighted or unweighted means analysis) – When there are just 2 groups, ANOVA is equivalent to the t test for independent means 15 One-Way ANOVA – Assuming Null Hypothesis is True… Within-Group Estimate Of Population Variance 2 est1 2 est 2 2 est 3 Between-Group Estimate Of Population Variance M1 2 withinest M2 M3 2 between est F= 2 between est 2 within est Justification for F statistic Calculating F Example Example Using the F Statistic • Use a table for F(BDF, WDF) – And also α BDF = between-groups degrees of freedom = number of groups -1 WDF = within-groups degrees of freedom = Σ df for all groups = N – number of groups One-way ANOVA in SPSS Data 6 5 4 Mean 3 Performance 2 1 0 1 Day 2 Day 3 Day 23 Analyze/Compare Means/One Way ANOVA… 24 SPSS Results… ANOVA Performance Between Groups Within Groups Total Sum of Squares 24.813 27.594 52.406 df 2 21 23 Mean Square 12.406 1.314 F(2,21)=9.442, p<.05 F 9.442 Sig . .001 Factorial Designs • Two or more nominal independent variables, each with two or more levels, and a numeric dependent variable. • Factorial ANOVA teases apart the contribution of each variable separately. • For N IVs, aka “N-way” ANOVA 26 Factorial Designs • Adding a second independent variable to a singlefactor design results in a FACTORIAL DESIGN • Two components can be assessed – The MAIN EFFECT of each independent variable • The separate effect of each independent variable • Analogous to separate experiments involving those variables – The INTERACTION between independent variables • When the effect of one independent variable changes over levels of a second • Or– when the effect of one variable depends on the level of the other variable. 27 Example Wait Time Sign in Student Center vs. No Sign Satisfaction Example of An Interaction - Student Center Sign – 2 Genders x 2 Sign Conditions Value of the Dependent Variable Level 1 Level 2 12 F 10 8 6 4 M 2 0 Level 1 Level 2 No Sign Sign Level of Independent Variable A Two-way ANOVA in SPSS 30 Analyze/General Linear Model/Univariate 31 Results Tests of Between-Subj ects Effects Dependent Variable: Performance Source Corrected Model Intercept Training Days Trainer Training Days * Trainer Error Total Corrected Total Type III Sum of Squares 26.507a 210.855 20.728 .002 1.680 25.899 401.250 52.406 df 5 1 2 1 2 18 24 23 Mean Square 5.301 210.855 10.364 .002 .840 1.439 F 3.685 146.547 7.203 .001 .584 a. R Squared = .506 (Adjusted R Squared = .369) 32 Sig . .018 .000 .005 .974 .568 Results 33 Degrees of Freedom • df for between-group variance estimates for main effects – Number of levels – 1 • df for between-group variance estimates for interaction effect – Total num cells – df for both main effects – 1 – e.g. 2x2 => 4 – (1+1) – 1 = 1 • df for within-group variance estimate – Sum of df for each cell = N – num cells • Report: “F(bet-group, within-group)=F, Sig.” 34 Publication format Tests of Between-Subj ects Effects Dependent Variable: Performance Source Corrected Model Intercept Training Days Trainer Training Days * Trainer Error Total Corrected Total Type III Sum of Squares 26.507a 210.855 20.728 .002 1.680 25.899 401.250 52.406 df 5 1 2 1 2 18 24 23 Mean Square 5.301 210.855 10.364 .002 .840 1.439 F 3.685 146.547 7.203 .001 .584 a. R Squared = .506 (Adjusted R Squared = .369) N=24, 2x3=6 cells => df TrainingDays=2, df within-group variance=24-6=18 => F(2,18)=7.20, p<.05 Sig . .018 .000 .005 .974 .568 Reporting rule • IF you have a significant interaction • THEN – If 2x2 study: do not report main effects, even if significant – Else: must look at patterns of means in cells to determine whether to report main effects or not. 36 Results? Sig. 0.34 0.12 0.41 TrainingDays Trainer TrainingDays * Trainer n.s. Results? TrainingDays Trainer TrainingDays * Trainer Sig. 0.34 0.12 0.02 Significant interaction between TrainingDays And Trainer, F(2,22)=.584, p<.05 Results? TrainingDays Trainer TrainingDays * Trainer Sig. 0.34 0.02 0.41 Main effect of Trainer, F(1,22)=.001, p<.05 Results? TrainingDays Trainer TrainingDays * Trainer Sig. 0.04 0.12 0.01 Significant interaction between TrainingDays And Trainer, F(2,22)=.584, p<.05 Do not report TrainingDays as significant Results? TrainingDays Trainer TrainingDays * Trainer Sig. 0.04 0.02 0.41 Main effects for both TrainingDays, F(2,22)=7.20, p<.05, and Trainer, F(1,22)=.001, p<.05 “Factorial Design” • Not all cells in your design need to be tested – But if they are, it is a “full factorial design”, and you do a “full factorial ANOVA” Real-Time Retrospective Agent Text X Higher-Order Factorial Designs • More than two independent variables are included in a higher-order factorial design – As factors are added, the complexity of the experimental design increases • The number of possible main effects and interactions increases • The number of subjects required increases • The volume of materials and amount of time needed to complete the experiment increases 43