SW388R7 Data Analysis & Computers II Computing Transformations Slide 1 Transforming variables Transformations for normality Transformations for linearity.

Download Report

Transcript SW388R7 Data Analysis & Computers II Computing Transformations Slide 1 Transforming variables Transformations for normality Transformations for linearity.

SW388R7
Data Analysis &
Computers II
Computing Transformations
Slide 1
Transforming variables
Transformations for normality
Transformations for linearity
SW388R7
Data Analysis &
Computers II
Slide 2
Transformations:
Transforming variables to satisfy assumptions


When a metric variable fails to satisfy the
assumption of normality, homogeneity of variance, or
linearity, we may be able to correct the deficiency
by using a transformation.
We will consider three transformations for normality,
homogeneity of variance, and linearity:




the logarithmic transformation
the square root transformation, and
the inverse transformation
plus a fourth that may be useful for problems of
linearity:

the square transformation
SW388R7
Data Analysis &
Computers II
Slide 3
Transformations change the measurement
scale
In the diagram to the right, the values of
5 through 20 are plotted on the different
scales used in the transformations. These
scales would be used in plotting the
horizontal axis of the histogram depicting
the distribution.
When comparing values measured on the
decimal scale to which we are
accustomed, we see that each
transformation changes the distance
between the benchmark measurements.
All of the transformations increase the
distance between small values and
decrease the distance between large
values. This has the effect of moving the
positively skewed values to the left,
reducing the effect of the skewing and
producing a distribution that more closely
resembles a normal distribution.
Transformations:
Computing transformations in SPSS
SW388R7
Data Analysis &
Computers II
Slide 4


In SPSS, transformations are obtained by computing a
new variable. SPSS functions are available for the
logarithmic (LG10) and square root (SQRT)
transformations. The inverse transformation uses a
formula which divides one by the original value for
each case.
For each of these calculations, there may be data
values which are not mathematically permissible.
For example, the log of zero is not defined
mathematically, division by zero is not permitted,
and the square root of a negative number results in
an “imaginary” value. We will usually adjust the
values passed to the function to make certain that
these illegal operations do not occur.
Transformations:
Two forms for computing transformations
SW388R7
Data Analysis &
Computers II
Slide 5



There are two forms for each of the transformations
to induce normality, depending on whether the
distribution is skewed negatively to the left or
skewed positively to the right.
Both forms use the same SPSS functions and formula
to calculate the transformations.
The two forms differ in the value or argument passed
to the functions and formula. The argument to the
functions is an adjustment to the original value of
the variable to make certain that all of the
calculations are mathematically correct.
Transformations:
Functions and formulas for transformations
SW388R7
Data Analysis &
Computers II
Slide 6

Symbolically, if we let x stand for the argument
passes to the function or formula, the calculations
for the transformations are:



Logarithmic transformation: compute log = LG10(x)
Square root transformation: compute sqrt =
SQRT(x)

Inverse transformation: compute inv = -1 / (x)

Square transformation: compute s2 = x * x
For all transformations, the argument must be
greater than zero to guarantee that the calculations
are mathematically legitimate.
SW388R7
Data Analysis &
Computers II
Slide 7
Transformations:
Transformation of positively skewed variables



For positively skewed variables, the argument is an
adjustment to the original value based on the
minimum value for the variable.
If the minimum value for a variable is zero, the
adjustment requires that we add one to each value,
e.g. x + 1.
If the minimum value for a variable is a negative
number (e.g., –6), the adjustment requires that we
add the absolute value of the minimum value (e.g. 6)
plus one (e.g. x + 6 + 1, which equals x +7).
Transformations:
Example of positively skewed variable
SW388R7
Data Analysis &
Computers II
Slide 8



Suppose our dataset contains the number of books
read (books) for 5 subjects: 1, 3, 0, 5, and 2, and the
distribution is positively skewed.
The minimum value for the variable books is 0. The
adjustment for each case is books + 1.
The transformations would be calculated as follows:
 Compute logBooks = LG10(books + 1)
 Compute sqrBooks = SQRT(books + 1)
 Compute invBooks = -1 / (books + 1)
Transformations:
Transformation of negatively skewed variables
SW388R7
Data Analysis &
Computers II
Slide 9



If the distribution of a variable is negatively skewed,
the adjustment of the values reverses, or reflects,
the distribution so that it becomes positively skewed.
The transformations are then computed on the
values in the positively skewed distribution.
Reflection is computed by subtracting all of the
values for a variable from one plus the absolute
value of maximum value for the variable. This results
in a positively skewed distribution with all values
larger than zero.
When an analysis uses a transformation involving
reflection, we must remember that this will reverse
the direction of all of the relationships in which the
variable is involved. Our interpretation of
relationships must be adjusted accordingly.
Transformations:
Example of negatively skewed variable
SW388R7
Data Analysis &
Computers II
Slide 10



Suppose our dataset contains the number of books
read (books) for 5 subjects: 1, 3, 0, 5, and 2, and the
distribution is negatively skewed.
The maximum value for the variable books is 5. The
adjustment for each case is 6 - books.
The transformations would be calculated as follows:
 Compute logBooks = LG10(6 - books)
 Compute sqrBooks = SQRT(6 - books)
 Compute invBooks = -1 / (6 - books)
Transformations:
The Square Transformation for Linearity
SW388R7
Data Analysis &
Computers II
Slide 11



The square transformation is computed by
multiplying the value for the variable by itself.
It does not matter whether the distribution is
positively or negatively skewed.
It does matter if the variable has negative values,
since we would not be able to distinguish their
squares from the square of a comparable positive
value (e.g. the square of -4 is equal to the square of
+4). If the variable has negative values, we add the
absolute value of the minimum value to each score
before squaring it.
Transformations:
Example of the square transformation
SW388R7
Data Analysis &
Computers II
Slide 12



Suppose our dataset contains change scores (chg) for
5 subjects that indicate the difference between test
scores at the end of a semester and test scores at
mid-term: -10, 0, 10, 20, and 30.
The minimum score is -10. The absolute value of the
minimum score is 10.
The transformation would be calculated as follows:
 Compute squarChg = (chg + 10) * (chg + 10)
Transformations:
Transformations for normality
SW388R7
Data Analysis &
Computers II
Slide 13
Both the histogram and the normality plot for Total
Time Spent on the Internet (netime) indicate that the
variable is not normally distributed.
Histogram
Normal Q-Q Plot of TOTAL TIME SPENT ON THE IN
50
3
40
2
1
30
0
Expected Normal
Frequency
20
10
Std. Dev = 15.35
-1
-2
Mean = 10.7
N = 93.00
0
0.0
20.0
10.0
40.0
30.0
60.0
50.0
80.0
70.0
TOTAL TIME SPENT ON THE INTERNET
100.0
90.0
-3
-40
-20
Observed Value
0
20
40
60
80
100
120
SW388R7
Data Analysis &
Computers II
Slide 14
Transformations:
Determine whether reflection is required
Descriptives
TOTAL TIME SPENT
ON THE INTERNET
Mean
95% Confidence
Interval for Mean
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis
Lower Bound
Upper Bound
Statistic
10.73
7.57
Std. Error
1.59
13.89
8.29
5.50
235.655
15.35
0
102
102
10.20
3.532
15.614
Skewness, in the table of Descriptive Statistics,
indicates whether or not reflection (reversing the values)
is required in the transformation.
If Skewness is positive, as it is in this problem,
reflection is not required. If Skewness is negative,
reflection is required.
.250
.495
SW388R7
Data Analysis &
Computers II
Slide 15
Transformations:
Compute the adjustment to the argument
Descriptives
TOTAL TIME SPENT
ON THE INTERNET
Mean
95% Confidence
Interval for Mean
Lower Bound
Upper Bound
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis
Statistic
10.73
7.57
13.89
8.29
5.50
235.655
15.35
0
102
102
10.20
3.532
15.614
In this problem, the minimum value is 0, so 1 will be
added to each value in the formula, i.e. the argument
to the SPSS functions and formula for the inverse will
be:
netime + 1.
Std. Error
1.59
.250
.495
SW388R7
Data Analysis &
Computers II
Slide 16
Transformations:
Computing the logarithmic transformation
To compute the transformation,
select the Compute… command
from the Transform menu.
SW388R7
Data Analysis &
Computers II
Slide 17
Transformations:
Specifying the transform variable name and function
First, in the Target Variable text box, type a
name for the log transformation variable, e.g.
“lgnetime“.
Second, scroll down the list of functions to
find LG10, which calculates logarithmic
values use a base of 10. (The logarithmic
values are the power to which 10 is raised
to produce the original number.)
Third, click
on the up
arrow button
to move the
highlighted
function to
the Numeric
Expression
text box.
SW388R7
Data Analysis &
Computers II
Slide 18
Transformations:
Adding the variable name to the function
Second, click on the right arrow
button. SPSS will replace the
highlighted text in the function (?)
with the name of the variable.
First, scroll down the list of
variables to locate the
variable we want to transform.
Click on its name so that it is
highlighted.
SW388R7
Data Analysis &
Computers II
Slide 19
Transformations:
Adding the constant to the function
Following the rules stated for determining the constant
that needs to be included in the function either to
prevent mathematical errors, or to do reflection, we
include the constant in the function argument. In this
case, we add 1 to the netime variable.
Click on the OK
button to complete
the compute request.
SW388R7
Data Analysis &
Computers II
Slide 20
Transformations:
The transformed variable
The transformed variable which we
requested SPSS compute is shown in the
data editor in a column to the right of the
other variables in the dataset.
SW388R7
Data Analysis &
Computers II
Slide 21
Transformations:
Computing the square root transformation
To compute the transformation,
select the Compute… command
from the Transform menu.
SW388R7
Data Analysis &
Computers II
Slide 22
Transformations:
Specifying the transform variable name and function
First, in the Target Variable text box, type a
name for the square root transformation
variable, e.g. “sqnetime“.
Second, scroll down the list of functions to
find SQRT, which calculates the square root
of a variable.
Third, click
on the up
arrow button
to move the
highlighted
function to
the Numeric
Expression
text box.
SW388R7
Data Analysis &
Computers II
Slide 23
Transformations:
Adding the variable name to the function
Second, click on the right arrow
button. SPSS will replace the
highlighted text in the function (?)
with the name of the variable.
First, scroll down the list of
variables to locate the
variable we want to transform.
Click on its name so that it is
highlighted.
SW388R7
Data Analysis &
Computers II
Slide 24
Transformations:
Adding the constant to the function
Following the rules stated for determining the constant
that needs to be included in the function either to
prevent mathematical errors, or to do reflection, we
include the constant in the function argument. In this
case, we add 1 to the netime variable.
Click on the OK
button to complete
the compute request.
SW388R7
Data Analysis &
Computers II
Slide 25
Transformations:
The transformed variable
The transformed variable which we
requested SPSS compute is shown in the
data editor in a column to the right of the
other variables in the dataset.
SW388R7
Data Analysis &
Computers II
Slide 26
Transformations:
Computing the inverse transformation
To compute the transformation,
select the Compute… command
from the Transform menu.
SW388R7
Data Analysis &
Computers II
Slide 27
Transformations:
Specifying the transform variable name and formula
First, in the Target
Variable text box, type a
name for the inverse
transformation variable,
e.g. “innetime“.
Second, there is not a function for
computing the inverse, so we type
the formula directly into the
Numeric Expression text box.
Third, click on the
OK button to
complete the
compute request.
SW388R7
Data Analysis &
Computers II
Slide 28
Transformations:
The transformed variable
The transformed variable which we
requested SPSS compute is shown in the
data editor in a column to the right of the
other variables in the dataset.
SW388R7
Data Analysis &
Computers II
Slide 29
Transformations:
Adjustment to the argument for the square
transformation
It is mathematically correct to square a value of zero, so the
adjustment to the argument for the square transformation is
different. What we need to avoid are negative numbers,
since the square of a negative number produces the same
value as the square of a positive number.
Descriptives
TOTAL TIME SPENT
ON THE INTERNET
Mean
95% Confidence
Interval for Mean
Lower Bound
Upper Bound
Statistic
10.73
7.57
13.89
5% Trimmed Mean
8.29
Median
5.50
Variance
235.655
Std. Deviation
15.35
Minimum
0
Maximum
102
Range
102
Interquartile Range
10.20
In this problem, the minimum value is 0, no adjustment
Skewness
3.532
is needed for computing the square. If the minimum
Kurtosis
15.614
was a number less than zero, we would add the
absolute value of the minimum (dropping the sign) as
an adjustment to the variable.
Std. Error
1.59
.250
.495
SW388R7
Data Analysis &
Computers II
Slide 30
Transformations:
Computing the square transformation
To compute the transformation,
select the Compute… command
from the Transform menu.
SW388R7
Data Analysis &
Computers II
Slide 31
Transformations:
Specifying the transform variable name and formula
First, in the Target
Variable text box, type a
name for the inverse
transformation variable,
e.g. “s2netime“.
Second, there is not a function for
computing the square, so we type
the formula directly into the
Numeric Expression text box.
Third, click on the
OK button to
complete the
compute request.
SW388R7
Data Analysis &
Computers II
Slide 32
Transformations:
The transformed variable
The transformed variable which we
requested SPSS compute is shown in the
data editor in a column to the right of the
other variables in the dataset.
SW388R7
Data Analysis &
Computers II
Using the script to compute transformations
Slide 33
When the script tests
assumptions, it will create the
transformations that are
checked.
If you want to retain the transformed
variable to use in an analysis, clear the
checkbox that tells the script to delete the
transformed variables it created.
SW388R7
Data Analysis &
Computers II
The transformed variables
Slide 34
The transformed variables are
added to the data editor. The
variable names attempt to
identify the transformation in
the variable name.
The variable labels fully
identify the transformation,
including the function and
formula used to compute it.
SW388R7
Data Analysis &
Computers II
Which transformation to use
Slide 35
The recommendation of which transform to use is often summarized in a
pictorial chart like the above. In practice, it is difficult to determine which
distribution is most like your variable. It is often more efficient to compute
all transformations and examine the statistical properties of each.