DATA VISUALIZATION - Texas A&M University

Download Report

Transcript DATA VISUALIZATION - Texas A&M University

Assumptions of Multiple
Regression
• 1. Form of Relationship:
– linear vs nonlinear
– Main effects vs interaction effects
• 2. All relevant variables present
– Perfect reliability of predictors
• 3. Homoscedasticity (homogeneity) of errors of
prediction
• 4. -Independence of errors of prediction
• 5. Normality of errors of prediction (residuals)
REMEDIES 1
• Linear vs. Nonlinear modeling:
– Linearity means for our purposes that each
term in a model is added to another, and that
each term has only one parameter as part of it
– Eg. y = b1x1 + b2x2 + b0 + e is linear because
each variable (x1 , x2 , (intercept), e ) has only
one parameter with it: b1 , b2 , b0 , 1
– An alternative definition is that x1 and all the
other variables are actually best modeled by
some transformation, such as x21
FORM OF RELATIONSHIP
• Linearity- use lowess line to provide evidence
70
ANXIETY
60
50

























































































40













50





















































60
70
DEPRESSION
80
LLR Smoother
Relationship problems in MR
• Counseling
• Linearity: transform using power functions:
– If X is curvilinear in relation to Y, change X to
X2, Y = b1X2 + e
– If there is a possible interaction effect of two
predictors, X1 and X2, create a new variable
equal to the product (chapter 7)
– Transform Y using log(Y) or SQRT(Y) in
SPSS: Compute
REMEDIES 1
• x21 is still linear in the sense of the first
definition, but when we plot it we see a
different prediction line:
x
x2
1 1
1
4
2
4
9
3
9
y
data
y
y
blinear
16 4 16
bquadratic
9 5 25
4 6 36
x1
x21
Main Effects and Interactions
Main effects: effect for one predictor consistent
across all values of second predictor
Standard regression model:
y = b 1x 1 + b 2x 2 + b 0
Interaction: effect additional to main effects
Defined as product of two variables: x1*x2
new predictor variable
2. The Missing Variable
• ALL models can change seriously if we
are missing an important variable Z
– b-weights will usually change if Z is correlated
with the predictors already in the model
– Standard error of estimation and standard
errors of b-weights will be reduced if Z is
related to Y and not the other predictors
– All above will change depending on the
combination of relationships
Relevant Variables
• Theory, theory, theory
• Test additional variables for change in Rsquare, change in b- and beta weights of
remaining predictors, change in residual
characteristics.
Omitted relevant variable
• Effect of omitted predictor on beta weights,
R-square, path coefficients
• Added value plots- examine error of new
predictor from other predictors graphed
against errors of prediction of original
predictors
Measurement error in IVs
• Validity is attenuated (correlation between
predictor and outcome) if the predictor is
measured unreliably
• Disattenuated correlation between
predictor and dependent variable is
estimated by dividing by the square root of
the reliability of the predictor (and of the
dependent variable if we want a construct
estimate of the correlation)
Measurement error in IVs
• Structural Equation Modeling (SEM)
correctly estimates standard errors of
parameters under ML estimation, assumes
large samples (at least 100)
• OLS does not correctly correct the
disattenuated correlation standard error for
the b-weight
Omitted relevant variable
Error of Initial Prediction by Error of Social Stress Prediction








 
20.00000


0.00000
-10.00000
LLR Smoother




 










 

 







 





    
   



 

   
 

  


     
  

 

 

  


  
 


  





     


 



     


   







   



  




   

   

 







  
 

 
  
 










       

     




    
   





 











    

  

 
  
  






10.00000



-5.00000

0.00000
5.00000
Residua of Social Stressl

10.00000
Predictors:
Sensation
Seeking, Locus of
Control, Self
Reliance
Omitted relevant variable
Error of Initial Prediction by Error of Social Stress Prediction



20.00000


0.00000
-10.00000
Linear Regression




 




 





 




 



 









 

  

 

 
   
 
 
  




    
  

 

 





 
  



  





     


 
= -0.00

Unstandardized
 + 0.25 * RES_2
  
   Residual






    

 R-Square
 =0.01



  




  





    

 








  
 

 
   
 






    
  











    
   





 














    

  

 
  
  






10.00000





 



-5.00000
0.00000
5.00000
Residual of Social Stress

10.00000
Predictors:
Sensation
Seeking, Locus of
Control, Self
Reliance
Measurement Error
• Requires Structural Equation Model
Approach to include measurement error in
predictors
– Requires independent estimate of reliability of
each predictor appropriate for the sample
T1
e1
β31
√1-Rel. T1
T3
T2
e2
√1-Rel. T2
β32
e3
MRA ASSUMPTIONS-3
• Homoscedasticity- errors of the prediction
hyperplane are assumed homogeneous
across the hyperplane (think of normal
distributions of errors sticking out above
and below the hyperplane:
y
x1
x2
Predictor values for x1 and
x2
3. Nonhomogeneity of Variance
• If variances are unequal due to known
groupings or clustering, each group’s
variance can be estimated separately in
SEM to fit correctly a regression model
• If variances are changing linearly for a
predictor (or set of predictors):
– Weighted least squares can be used
– Nonlinear modeling (Hierarchical linear
model) of the variance can be conducted
using SEM
Homoscedasticity of residuals: SPSS Regression: SAVE:
Unstandardized Predicted Values, Unstandardized Residuals
SPSS Graph: Interactive: Scatterplot: Fit: Smoother
Error of Initial Prediction by Error
of Social Stress Prediction



5.00000











































5.00000



















Unstandardized Residual
= 0.00 + -0.00 * PRE_1
-5.00000
















































































0.00000











































































LLR Smoother

































0.00000


























 = 0.00

R-Square



















































Linear Regression




Error of Initial Prediction by Error
of Social Stress Prediction




-5.00000
2.00000
4.00000
6.00000
Unstandardized Predicted Value
2.00000
4.00000
6.00000
Unstandardized Predicted Value
Error Bars show 95.0% Cl of Mean
60



ANXIETY
50





40

30
10
12
14
age
16
18
Error bars
suggest
increasing
variance with age,
poor estimates for
age 11, 19
Nonhomogeneous Variance
Correction
• Levene’s test, Brown-Forsythe correction
for ANOVA
• Not comparable correction for regression;
divide predictor into 5 subgroups based on
distribution, use ANOVA
• Use Weighted Least Squares (see text)
Dependent Errors
• Clustering- data occur in groups
(classrooms, clinics, etc.) so that errors
are more similar within cluster than
between cluster
• Autocorrelation- data are correlated within
person over time (or between clustered
pairs, triplets etc.)
Dependent Errors
• Clustering becomes a category variable in
the analysis through dummy coding;
separate group models can be constructed
through SEM, or the category variable can
be included in the path model (which is
better depends on homogeneous variance
or not)
Dependent Errors
• Autocorrelation is computed for time data
by correlating pairs of data; each pair is
the data at one time point with its
successor. Eg. (x1, x2), (x2, x3), (x3, x4), etc.
• The regression of y on itself r uses the
autocorrelation yt – r yt-1 = y*t
• The new dependent variable y*t is now
modeled in a regression
Dependent Errors
• Means correlation among errors in sets of
the data (eg. Some siblings in a group of
adolescent sample of BASC)
– Likely problem only if a meaningful fraction of
the total sample size
• Difficult to determine, separate from
random pairings of errors
• Other data (cluster information) needed,
such as age for BASC Anxiety
Dependent Errors
• Time-related or longitudinal data may have
autocorrelation (correlation over time of
residuals); Durbin-Watson test gives
omnibus test; ARMA model given below:

Time1

Time2
e2

Time3
e3


. . .
MRA ASSUMPTIONS-5
• NORMALITY OF RESIDUALS:Violation of this
assumption is not a problem
– unless SKEWNESS is severe ( >± 4 or 5, maybe
even larger) and
– KURTOSIS is severe (> 3)Combinations of skewness
and kurtosis at the edge of these values may cause
problems
• Effects of violation:
– Type I error rates increase, sometimes greatly
– Estimates can be biased
Normality of residuals
75
50
25
0
-2.50000
0.00000
2.50000
5.00000
Residual of Predicted Attitude to School
Q-Q Plot
• Plots the quantiles of a variable's distribution against the quantiles of
any of a number of test distributions. Probability plots are generally
used to determine whether the distribution of a variable matches a
given distribution. If the selected variable matches the test
distribution, the points cluster around a straight line.
• Available test distributions include beta, chi-square, exponential,
gamma, half-normal, Laplace, Logistic, Lognormal, normal, pareto,
Student's t, Weibull, and uniform. Depending on the distribution
selected, you can specify degrees of freedom and other parameters.
• You obtain probability plots for transformed values. Transformation
options include natural log, standardize values, difference, and
seasonally difference.
• You can specify the method for calculating expected distributions,
and for resolving "ties," or multiple observations with the same
value.
Q-Q plot: SPSS GRAPH: Q-Q
Normal Q-Q Plot of ATTITUDE TO SCHOOL
70
Residual
of
prediction
l
V60
l
m
r
t
e50
40
40
50
60
Observed Value
70
P-P plots
• Plots a variable's cumulative proportions against the cumulative
proportions of any of a number of test distributions. Probability plots
are generally used to determine whether the distribution of a
variable matches a given distribution. If the selected variable
matches the test distribution, the points cluster around a straight
line.
• Available test distributions include beta, chi-square, exponential,
gamma, half-normal, Laplace, Logistic, Lognormal, normal, pareto,
Student's t, Weibull, and uniform. Depending on the distribution
selected, you can specify degrees of freedom and other parameters.
• You obtain probability plots for transformed values. Transformation
options include natural log, standardize values, difference, and
seasonally difference.
• You can specify the method for calculating expected distributions,
and for resolving "ties," or multiple observations with the same
value.
P-P Plot: SPSS: Graph: P-P
Normal P-P Plot of ATTITUDE TO SCHOOL
1.0
Residual of
prediction
Expected Cum Prob
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
Observed Cum Prob
0.8
1.0