The Basics of Social Research 2/e

Transcript The Basics of Social Research 2/e

Regression

single and multiple

Overview

• Defined: A model for predicting one variable from other variable(s).

• Variables: • Relationship: IV(s) is continuous, DV is continuous Relationship amongst variables • Example: Can we predict height from weight? (or weight from height?, or weight from multiple variables?, or height from multiple variables?, • Assumptions: Normality. Linearity. Multicollinearity

Regression is about finding the best straight line

The best straight line is the one that minimizes S, the sum of the squares

Once we find the best straight line, we know the “intercept” and the “slope”:

Same Intercept, Different slope

70 60 50 40 30 20 10 0 0 2 4 6 Number of Pints 8 10

Same slope, Different Intercept

80 70 60 50 40 30 20 10 0 0 2 4 6 Number of Pints 8 10

Relationship between correlation and regression

• •

Correlation

 expresses the strength and direction of the relationship between two variables.

Regression

 is an extension of correlation, and allows you to make predictions about one variable from other variable(s)   Bivariate regression (1 IV and 1 DV) produces the same result as correlation Multiple regression (1+ IVs and 1 DV) goes a step farther than correlation

• •

Relationship between correlation and regression

Hypothesis:  What is the relationship between gun ownership and murder rate within a city?

Correlation:  Imagine you are a researcher interested in the relationship between number of registered weapons (“weapons”) and the murder rate (“murder”) so you collect data on those two variables from many different cities. You find a strong positive relationship (.885) between the two variables that is statistically significant (p=.003).

Correl ations

Automatic weapons in thousands Murder rate (in murders per 100,000) Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Automatic weapons in thousands 1 8 .

.885** .003

8 **. Correlation is s ignificant at t he 0.01 level (2-tailed).

Murder rate (in murders per 100,000) .885** .003

8 1 .

Relationship between correlation and regression

• Regression:  Now, imagine you are the Mayor of Los Angeles. You are considering lifting the ban on automatic weapons. You want to predict whether lifting the ban (so increasing the number of automatic weapons on the streets) will impact the murder rate.  You are going to use the data (from the above 8 cities) to PREDICT the relationship for a 9th city – Los Angeles. You find a strong positive relationship (.885) between the two variables that is statistically significant (p=.003).

Coefficients a

Model 1 Unstandardized B Coefficients 4.047

Std. Error 1.089

Standardized Coefficients (Constant) Automatic weapons in thousands .853

.183

a. Dependent Variable: Murder rate (in murders per 100,000) Beta .885

t 3.715

4.656

Sig.

.010

.003

Relationship between correlation and regression

• Regression:  We can now use numbers from output to create a “regression line”  For example, the regression line is: Y = a + bX Y = the unknown score on the variable you are predicting. a = the Y-intercept of the regression line.

b = the slope of the regression line. X = the known score on the other variable you are using to make a prediction.

 Y = a + b * X Murders = 4.047 + .853 *

Weapons Coefficients a

Model 1 Unstandardized B Coefficients 4.047

Std. Error 1.089

Standardized Coefficients (Constant) Automatic weapons in thousands .853

.183

a. Dependent Variable: Murder rate (in murders per 100,000) Beta .885

t 3.715

4.656

Sig.

.010

.003

Relationship between correlation and regression

• Regression:  Y = a + b * X Murders = 4.047 + .853 *

Weapons

  If you are the Mayor of Los Angeles, simply insert into the regression equation the number of weapons on the street in Los Angeles (X), and you can predict the number of murders (Y) If 1000 weapons, then murders will be = 857 If 2000 weapons, then murders will be = 1710 If 3000 weapons, then murders will be = 2563

Coefficients a

Model 1 Unstandardized B Coefficients 4.047

Std. Error 1.089

Standardized Coefficients (Constant) Automatic weapons in thousands .853

.183

a. Dependent Variable: Murder rate (in murders per 100,000) Beta .885

t 3.715

4.656

Sig.

.010

.003

Multiple Regression

• • Using several “predictors” simultaneously Example: Study about internalizing violence (DV)  Degree of witnessing violence X1   Measure of life stress X2 Measure of social support X3 DV

Multiple Regression

• Given this diagram, what would you want to know:  (1) When all three entered, overall prediction (variance) of DV Model Summary R .37

a R Square .135

Adjusted R Square .108

Std. Error of the Estimate 2.2198

a. Predictors: (Constant), Social support, Current stress, Amount violenced witnessed DV

Multiple Regression

 (2) unique prediction of each variable Coefficients a (Constant) Amount violenced witnessed Current stress Unstandardized Coefficients B .477 1.289

.038

.273

Std.

Error .018

.106

Standardized Coefficients Beta .201

.247

t .37

2.1

2.6

Social support -.074

.043

-.166

-2 a. Dependent Variable: Internalizing symptoms on CBCL Sig.

.712

.039

.012

.087

ˆ  

1  0 .

038

Wit b

2 

0 2 

273 3

3 

Stress b

0  0 .

074

SocSupp

 0 .

477

Coefficients a (Constant) Amount violenced witnessed Current stress Unstandardized Coefficients B .477 1.289

.038

.273

Std.

Error .018

.106

Standardized Coefficients Beta .201

.247

t .37

2.1

2.6

Social support -.074

.043

-.166

-2 a. Dependent Variable: Internalizing symptoms on CBCL Sig.

.712

.039

.012

.087

Correlations Sta tistic s Amount violenced witnessed Current stress Social support Amount violenced witnessed Current stress Social support Internalizing symptoms on CBCL .050

.080

.200* -.080

.270** *. Correlation is significant at the 0.05 level (2-tailed).

**. Correlation is significant at the 0.01 level (2-tailed).

-.170

Internalizin g symptoms on CBCL

Multiple Regression

• The three things you typically want to know are…  Overall effect (of all variables) = R 2  Unique effect of each variable, while controlling for the others = Beta  Unique effect of each variable, without controlling for others = correlation matrix (same as separate bivariate regressions)

Multiple Regression

• • What we have just talked about is:  Entry (all simultaneously) But you have other options as well:  Hierarchical  Stepwise (you specify order) (computer chooses based on criteria) • Backward • Forward • Stepwise

Hierarchical

• • • • You enter the variables in a specified order (called steps or blocks).

Block 1 tells you unique effect of the variable(s) Block 2 tells you unique effect of the new variable(s) And so forth

Forward

• • • • • Computer first enters predictor with highest correlation to DV Computer then enters predictor with highest semi partial correlation to DV • (if V1 explained 40% of DV, then 60% unexplained, so which variable is best explainer of the 60%) Computer then enters predictor with highest semi partial correlation to DV • (if V1 and V2 explained 80%, then which variable best explains the 20%, etc) and so forth… Stops when no new variables significantly explains the residual variation.

Backward

• • • • Computer enters all variables and calculates unique contribution of each.

A removal criteria is set, and if variable(s) don’t meet the criteria, they are removed from analysis.

The new model is then analyzed, if variable(s) don’t meet the criteria, they are removed from the analysis.

Stops when no more variables meet criteria

Stepwise

• • • Combination of Forward and Backward Similar to Forward in that…   Computer first enters predictor with highest correlation to DV Computer then enters predictor with highest semi-partial correlation to DV Similar to Backward in that…  A removal criteria is set, and if variable(s) don’t meet the criteria, they are removed from analysis

How to choose which variables and how

• • Correlational matrix  IV Variables somewhat correlated to DV  IV Variables not too correlated with other IV Regression   Analyze your hypothesis first Then start “exploratory” analysis • Statisticians frown upon too much exploratory work as “fishing” • Entry and Hierarchical preferred over stepwise. If stepwise, Backward preferred over others.

The Basics of Social Research 2/e

Transcript The Basics of Social Research 2/e

Regression

single and multiple

Overview

Regression is about finding the best straight line

The best straight line is the one that minimizes S, the sum of the squares

Once we find the best straight line, we know the “intercept” and the “slope”:

Same Intercept, Different slope

Same slope, Different Intercept

Relationship between correlation and regression

Relationship between correlation and regression

Relationship between correlation and regression

Relationship between correlation and regression

Relationship between correlation and regression

Multiple Regression

Multiple Regression

Multiple Regression

Multiple Regression

Multiple Regression

Hierarchical

Forward

Backward

Stepwise

How to choose which variables and how

Directory