Transcript So far

So far...

 We have been estimating differences caused by application of various treatments, and determining the probability that an observed difference was due to chance  The presence of interactions may indicate that two or more treatment factors have a joint effect on a response variable  But we have not learned anything about how two (or more) variables are related

Types of Variables in Crop Experiments

 Treatments such as fertilizer rates, varieties, and weed control methods which are the primary focus of the experiment  Environmental factors, such as rainfall and solar radiation which are not within the researcher’s control  Responses which represent the biological and physical features of the experimental units that are expected to be affected by the treatments being tested

What is Regression?

 The way that one variable is related to another.

 As you change one, how are others affected?

Yield Grain Protein %

 May want to – – Develop and test a model for a biological system Predict the values of one variable from another

Usual associations within ANOVA...

 Agronomic experiments frequently consist of different levels of one or more quantitative variables: – – – Varying amounts of fertilizer Several different row spacings Two or more depths of seeding  Would be useful to develop an equation to describe the relationship between plant response and treatment level – the response could then be specified for not only the treatment levels actually tested but for all other intermediate points within the range of those treatments  Simplest form of response is a straight line

Fitting the Linear Regression Model

Wheat Yield (Y) Y 2 Y 3 Y 4 Y 1 X 1 X 2 X 3 X Applied N Level 4

Y =

 0

+

 1

X +

where: Y

= wheat yield

X

= nitrogen level 

0

= yield with no N 

1

= change in yield per unit of applied N  = random error  Choose a line that minimizes deviation of observed values from the line (predicted values)

Types of regression models

 Model I – Values of the independent variable X are controlled by the experimenter – – Assumed to be measured without error We measure response of the independent variable Y to changes in X  Model II – Both the X and the Y variables are measured and subject to error (e.g., in an observational study) – Either variable could be considered as the independent variable; choice depends on the context of the experiment – – Often interested in correlations between variables May be descriptive, but might not be reliable for prediction

Sums of Squares due to Regression

Y

    

0 1 X  Because the line passes through X,Y Y a bX bX  b   j (X  j  j (X X)(Y j  X) 2 j  Y)  SCP XY SSX   XY  2 X  SSR   j (X j  X)(Y j  Y)  j (X j  X) 2 2

Partitioning SST

Sums of Squares for Treatments (SST) contains:

– SS LIN = Sum of squares associated with the linear regression of Y on X (with 1 df) – SS LOF = Sum of squares for the failure of the regression model to describe the relationship between Y and X (lack of fit) (with t-2 df)

One way:

 Find a set of coefficients that define a linear contrast – use the deviations of the treatment levels from the mean level of all treatments – so that k j  X j  X  Therefore L LIN   j (X j  X)Y j  The sum of the coefficients will be zero, satisfying the definition of a contrast

Computing SS

LIN  SS LIN = r*L LIN 2 /[ S j (X j

_

- X) 2 ] really no different from any other contrast - df is always 1  SS LOF (sum of squares for lack of fit) is computed by subtraction SS LOF = SST - SS LIN (df is df for treatments - 1)  Not to be confused with SSE which is still the SS for pure error (experimental error)

F Ratios and their meaning

 All F ratios have MSE as a denominator  F T – = MST/MSE tests significance of differences among the treatment means  F LIN – = MS LIN /MSE tests H 0 : no linear relationship between X and Y (  1 – = 0) H a : there is a linear relationship between X and Y (  1  0)  F LOF – H 0 = MS LOF E(Y) =  0 /MSE tests : the simple linear regression model describes the data +  1 X – H a : there is significant deviation from a linear relationship between X and Y E(Y)   0 +  1 X

The linear relationship

 The expected value of Y given X is described by the equation: j   1 j  where: – – – Y = grand mean of Y X j = value of X (treatment level) at which Y is estimated L LIN   j (X j  X)Y j SS LIN   j r * L 2 LIN (X j  X) 2 b 1   j L (X j LIN  X) 2

Orthogonal Polynomials

 If the relationship is not linear, we can simplify curve fitting within the ANOVA with the use of orthogonal polynomial coefficients under these conditions: – – equal replication the levels of the treatment variable must be equally spaced • e.g., 20, 40, 60, 80, 100 kg of fertilizer per plot

Curve fitting

 Model: E(Y) =  0 +  1 X +  2 X 2 +  3 X 3 +…  Determine the coefficients for 2 nd order and higher polynomials from a table  Use the F ratio to test the significance of each contrast.

 Unless there is prior reason to believe that the equation is of a particular order, it is customary to fit the terms sequentially  Include all terms in the equation up to and including the term at which lack of fit first becomes nonsignificant

Table of coefficients

Where do linear contrast coefficients come from? (revisited)  L LIN   j (X j  X)Y j   Assume 5 Nitrogen levels: 30, 60, 90, 120, 150

_

x = 90  k 1 = (-60, -30, 0, 30, 60)   If we code the treatments as 1, 2, 3, 4, 5

_

x = 3   k 1 b 1 = (-2, -1, 0, 1, 2) = L LIN / [r S original scale j (x j

_

- x) 2 ], but must be decoded back to k 1   X  d X  

Consider an experiment

 Five levels of N (10, 30, 50, 70, 90) with four replications  Linear contrast – L LIN   1   2 – SS LIN = 4* L LIN 2 / 10 SS LIN   j r * L 2 LIN (X j  X) 2  (0)Y 3  (1)Y 4  (2)Y 5  Quadratic – L QUAD  (2)Y 1   2 – SS QUAD = 4*L QUAD 2 / 14   3   4  (2)Y 5

LOF still significant? Keep going…

 Cubic – L CUB   1  (2)Y 2  (0)Y 3   4  (1)Y 5 – SS CUB = 4*L CUB 2 / 10  Quartic – – L QUAR  (1)Y 1   2 SS QUAR = 4*L QUAR 2 / 70  (6)Y 3   4  (1)Y 5  Each contrast has 1 degree of freedom  Each F has MSE in denominator

Numerical Example

 An experiment to determine the effect of nitrogen on the yield of sugarbeet roots: – – – RBD three blocks 5 levels of N (0, 35, 70, 105, and 140) kg/ha  Meets the criteria – N is a quantitative variable – – levels are equally spaced equally replicated  Significant SST so we go to contrasts

Orthogonal Partition of SST

Order Mean Linear Quadratic Cubic Quartic 0 35 N level (kg/ha) 70 105 140 28.4 66.8 87.0 92.0 85.7

-2 +2 -1 -1 0 -2 +1 -1 +2 +2 -1 +1 +2 -4 0 +6 -2 -4 +1 +1 L i 46.60

S j k j 2

SS(L) i 10 651.4780

-34.87

2.30

0.30

14 260.5038

10 70 1.5870

.0039

Sequential Test of Nitrogen Effects

Source df (1)Nitrogen 4 (2)Linear Dev (LOF) 1 3 (3)Quadratic 1 Dev (LOF) 2 SS MS 913.5627

228.3907

F 64.41** 651.4680

262.0947

651.4680

87.3649

183.73** 24.64** 260.5038

1.5909

260.5038

.7955

73.47** 0.22ns

Choose a quadratic model

– –

First point at which the LOF is not significant Implies that a cubic term would not be significant

Regression Equation

b i = L REG / S j k j 2

Coefficient b 0 23.99

b 1 4.66

b 2 -2.49

Useful for prediction To scale to original X values

k 1 j   1j  2.49k

2 j for example, at 0 k g N/ha 1      X   d X    k 2   2     X  X d   2    t 2  1 12    

Y

 2

Easier way

1) use contrasts to find the best model and estimate pure error 2) get the equation from a graph or from regression analysis

Common misuse of regression...

 Broad Generalization – – – Extrapolating the result of a regression line outside the range of X values tested Don’t go beyond the highest nitrogen rate tested, for example Or don’t generalize over all varieties when you have just tested one  Do not over interpret higher order polynomials – with t-1 df, they will explain all of the variation among treatments, whether there is any meaningful pattern to the data or not

Class vs nonclass variables

 General linear model in matrix notation

Y = X ß +

 

X

is the design matrix – Assume a CRD with 3 fertilizer treatments, 2 replications This column is dropped - it provides no additional information 

1 1 1 1 1 1 x 1 x 2 x 3 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 ANOVA (class variables)

1 1 1 1 1 1 L 1 L 2 -1 -1 0 0 1 1 1 1 -2 -2 1 1 Orthogonal polynomials b 0 1 1 1 1 1 1 x x 2 30 30 900 900 60 3600 60 3600 90 8100 90 8100 Regression (continuous variables)