Assumption and Data Transformation
Download
Report
Transcript Assumption and Data Transformation
Assumption and Data
Transformation
Assumption of Anova
The error terms are randomly, independently,
and normally distributed
The variance of different samples are
homogeneous
Variances and means of different samples are
not correlated
The main effects are additive
Randomly, independently and
Normally distribution
The assumption of normality do not affect the validity of the
analysis of variance too seriously
There are test for normality, but it is rather point pointless to
apply them unless the number of samples we are dealing with is
fairly large
Independence implies that there is no relation between the size
of the error terms and the experimental grouping to which the
belong
It is important to avoid having all plots receiving a given
treatment occupying adjacent positions in the field
The best insurance against seriously violating the first
assumption of the analysis of variance is to carry out the
randomization appropriate to the particular design
Normally test
Shapiro-Wilk test
Lilliefors-Kolmogorov-Smirnov Test
Graphical methods based on residual error
(Residual Plotts)
Homogeneity of Variance
Unequal variances can have a marked effect on
the level of the test, especially if smaller sample
sizes are associated with groups having larger
variances
Unequal variances will lead to bias conclusion
Way to solve the problem of
Heterogeneous variances
We can separate the data into groups such that
the variances within each group are
homogenous
We can use an advance statistic tests rather than
analysis of variance
we might be able to transform the data in such
a way that they will be homogenous
Homogeneity test of Variance
Hartley F-max test
Bartlett’s test
Residual plot for checking the equal variance
assumption
Independence of Means and
Variance
It is a special case and the most common cause of
heterogeneity of variance
A positive correlation between means and variances is
often encountered when there is a wide range of
sample means
Data that often show a relation between variances and
means are data based on counts and data consisting of
proportion or percentages
Transformation data can frequently solve the problems
The Main effects are additive
For each design, there is a mathematical model called a
linear additive model.
It means that the value of experimental unit is made up
of general mean plus main effects plus an error term
When the effects are not additive, there are
multiplicative treatment effect
In the case of multiplication treatment effects, there are
again transformation that will change the data to fit the
additive model
Data Transformation
There are two ways in which the anova assumptions can be
violated:
1. Data may consist of measurement on an ordinal or a nominal
scale
2. Data may not satisfy at least one of the four requirements
Two options are available to analyze data:
1. It is recommended to use non-parametric data analysis
2. It is recommended to transform the data before analysis
Logaritmic Transformation
It is used when the standard deviation of samples are
roughly proportional to the means
There is an evidence of multiplicative rather than additive
Data with negative values or zero can not be transformed.
It is suggested to add 1 before transformation
Square Root Transformation
It is used when we are dealing with counts of rare events
The data tend to follow a Poisson distribution
If there is account less than 10. It is better to add 0.5 to
the value
Arcus sinus or angular
Transformation
It is used when we are dealing with counts expressed as
percentages or proportion of the total sample
Such data generally have a binomial distribution
Such data normally show typical characteristics in which
the variances are related to the means