Accounting for heterogeneous variances in beef cattle breeding

Download Report

Transcript Accounting for heterogeneous variances in beef cattle breeding

Slides available at http://www.msu.edu/~tempelma/nbcec1.pdf
Accounting for heterogeneous variances
(heteroskedasticity) in genetic evaluations
National Animal Breeding Seminar Series
Fall Semester 2004
Robert J. Tempelman
Michigan State University
A typical genetic evaluation model for
postweaning gain (PWG)
y = X1b1 + X2b2 + Z1u1 +Z2u2+ e
Fixed effects
Random effects
Random contemporary group
effects: u1
Non-genetic effects: b1 (age
of dam, length of PW
pd, calf sex)
Var (u1) -> autoregressive ys
within herds or NIID
Random additive genetic
effects: u2
Genetic effects: b2 (Breed and
dominance and
recombination loss
effects)
Var(u2) -> function of one or
more (multibreed)
components
y = Xb + Zu + e
7/18/2015
???????
2
Homoskedastic error models
e ~N (0,Ise
2)
Common s2e across environments,
factors, etc. may not be a suitable
assumption.
7/18/2015
3
Example of Heterogeneous
Variances
• Garrick et al. (1989)
– Separate genetic (s2g) and residual (s2e) variances estimated by
%Simmental and sex for postweaning gain.
Residual
s 2g
s2e
Genetic
7/18/2015
4
Structural (mixed effects) modeling
of variances (Foulley et al., 1992)
– model residual and genetic variances as a
function of fixed and random effects
2
– Example: Consider the residual variance s e jk
unique to fixed calf sex j and random CG k.
Log linear “mixed effects” model on log variance
 
log s e2jk  baseline  sex j  cgk ;
cgk ~ IID  0,  
Antilog both sides (Multiplicative model)
s
2
e jk
7/18/2015
e
baseline sex j
e
ecgk  s e2 j vk ; vk ~ IID 1, CV 2  f    
5
First known application of structural
variance model to beef cattle data
• San Cristobal et al. (1993) analyzing
muscular development scores in French
Maine Anjou cattle
– Scored on 0 to 100 scale.
• Considered structural variance model on
both residual AND genetic variances.
– Effects considered:
• classifier (random), condition score (fixed), year
(random), month(random) for residual variance
• Sex for genetic variance
7/18/2015
6
Representative results from San
Cristobal (multiplicative scale)
Factor
Level
Estimate
Baseline
1
97.57
Classifier
1
1.17
2
1.07
3
1.06
1
1
2
0.74
3
0.65
1
0.98
2
1.14
1
0.94
2
1.02
3
1.00
Condition Score
Year
Month
7/18/2015
For example, an animal
evaluated by Classifier 2
with condition score 2 born
in year 1 and month 2 has a
residual variance of:
97.57
x1.07
x0.74
x0.98
x1.02
=77.23
7
The underlying model for calving
ease (1-5 scale)
Colored
areas =
probability of
occurence
1= Unassisted
calving
5= Caesarean
Section
1
7/18/2015
2
3
4
5
(l)
8
Heterogeneous variances for
calving ease (CE)?
• Genetic evaluations based on threshold mixed
effects model.
– Underlying liability (l) is typically modeled as a function
of fixed (e.g. calf sex) and random effects (herd-yearseason) + IID residual (e); i.e.
li  xi' β  zi' u  ei ;
var  ei   1, i  1,2..., n.
– Heteroskedastic theory provided by Foulley and
Gianola (1996)
• Demonstrated that statistically significant calf sex by age of
dam interactions for CE in homoskedastic error threshold
models may be an artifact of heterogeneous residual variances
7/18/2015
9
ALLOWING FOR HETEROGENEOUS RESIDUAL
VARIANCES IN THRESHOLD MODELS
1
2
3
4
5
Note how
probability of
extreme outcomes
particularly depend
on residual
variance
7/18/2015
10
Genetic evaluations accounting for
calving ease
• French Holstein, Normande, and Montbeliarde
breeds (Ducrocq, 2000)
– Heteroskedasticity is breed dependent:
– ~15% lower residual variance in winter versus
summer.
– Larger residual variance (1.07-1.18x) for male calves.
• Italian Holsteins (Canavesi et al., 2003)
– Larger residual variance (1.03) for males
– Regional differences for residual variance
• Both evaluations only consider fixed effects
models for residual variances
7/18/2015
11
Fixed and random effects for log residual variances in
threshold models for calving ease
Kizilkaya and Tempelman (2005; GSE)
First parity Italian Piedmontese cattle
Parameter
F
Linear Mixed Model Analysis
of Birth Weights
Threshold Mixed Model
Analysis of Calving Ease
Estimate ± SE
Estimate ± SE
Sire Variance
1.13  0.20
0.13  0.02
MGS Variance
0.50  0.11
0.02  0.01
Sire-MGS covariance
0.35  0.11
0.02  0.01
CG variance
1.68  0.19
0.13  0.02
Male residual variance
14.44  1.03
1.09  0.09
Female residual variance
10.19  0.73
0.71  0.06
Sex difference in residual
variances
4.26  0.53
0.38  0.05
CV for herd-specific
0.60  0.09
0.74  0.14
R variances
7/18/2015
Fixed effects and Random effects for Residual Heteroskedasticity
12
•
Estimates ( )of and 95% credible sets ( ) for Herd
Specific Variances for CE Relative to Baseline (1.0)
Note: Because sire-mgs
model was used, residual
heteroskedasticity may be
partly genetic
CV = 0.74
7/18/2015
13
Impact on calving ease EPD’s?
Heteroskedastic vs. Homoskedastic Error
2
sˆ Sire
 0.34
7/18/2015
14
Impact of residual heteroskedasticity across CG on Sire
EPD’s for birthweights
(Kizilkaya and Tempelman, 2005)
CV = 0.60
Implications of
ranking herds for
product uniformity!
Herd 66
7/18/2015
2
sˆ sire
 1.06
Sire A
All of Sire’s A progeny were from Herd 66
15
Multiple Breed Populations
• Might naturally expect heterogeneous
genetic variances (for different
breedgroups and different levels of
heterozygosity)
7/18/2015
16
Multibreed genetic modeling
• Additive model (Lo et al., 1993)
• For any individual j, its additive genetic
effect aj has variance:
B
 
Var a j  
b1
fb s A2b
j
B 1 B
2
b1 bb

sj sj
fb fb

dj dj
fb fb


s S2bb  0.5cov a sj , a dj
fb j Expected allelic contribution due to Breed b
in individual/parent j
s A2
Additive genetic variance of Breed b
s S2
Variance due to genetic segregation
between Breeds b and b’
b
bb '
7/18/2015
17

Simple two breed example
Suppose
s S2  20
12
P1
s
2
g  P1 
s g2P   50 P2
 100
2
2
s
F1
g ( F )  75
1
2
F2 s g ( F2 )  95
7/18/2015
Theory used for QTL
mapping in pig breed
crosses: better power
than Haley-Knott
regression
(Perez-Enciso and
Varona, 2000)
18
Application:Nelore-Hereford data
(Fernando Cardoso PhD)
 Data set:
 22,717 post-weaning gain (PWG) records
on Hereford and Nelore x Hereford calves
raised in Brazil (from 1974-2000)
 40,082 animals (including ancestors in
pedigree file)
 Breed compositions of animals with
records ranged from purebred Hereford to
7/8 Nelore
 Purebred Herefords and F1’s represent 90%
of the data
7/18/2015
19
7/18/2015
20
But maybe the residual variances are
heterogeneous too!

Beef cattle performance is recorded across diverse
production systems and environments, with data
quality often compromised by, e.g.


7/18/2015
Recording error, preferential treatment,
disease, etc.
Hierarchical model constructions have been
independently used to address

heteroskedasticity (Foulley et al., 1992;
SanCristobal et al., 1993) and

robustness to outliers (Stranden and Gianola,
1998, 1999).
Important to discern outliers from highvariance subclasses
21
First stage: Specify the Linear
Mixed Model
y = X1b1 + X2b2 + Z1u1 +Z2u2+ e
Fixed effects
Random effects
Non-genetic effects: b1 (age
of dam, length of PW
period, calf sex)
Random contemporary group
effects: u1
Genetic effects: b2 (Breed
additive, dominance and
recombination loss
effects)
Random additive genetic
effects: u2
y = Xb + Zu + e
OR
7/18/2015
y j  x'j β  z'j u  e j , i  1, 2..., n.
22
Second stage: Structural
variance model

e j ~ N 0, s e2j
log
 
s e2j
 
o
 
 log s e   pm j log  m    log  k
2
baseline
EXAMPLES
m1
Regression
parameters
s e2j
k 1
 j
s
 
  log vl
l 1
 j
Calf
sex
 
 log wj 1
Fixed
Random
classification classification
effects
effects
Breed
proportion
Breed
heterozygosity
7/18/2015
r

Lack-offit term
with
mean 0
CG
o
r
s




p
2
mj
s e     m    k j    vl j  
 m1
 k 1
 l 1


wj
23
Distributional assumptions on
random effects
• Location parameters:
– u includes 940 CG (uCG) and 40,082 additive genetic
effects (uA):
• uCG ~ N(0,Is2CG)
• uA ~ N(0,G(f)) where f includes breed specific variances and
segregation variances.
• Residual variance
– v = [v1 v2 v940] includes random relative variances
for 940 CG
• vi ~ IID Inverted-gamma with mean 1 and standard deviation
sv
7/18/2015
24
Need to consider one more thing

• Recall e j ~ N 0, s
2
ej

where
log
 
s e2j
 
o
r
 
 log s e   pm j log  m    log  k
2
m1
k 1
 j
s
 
  log vl
l 1
 j
 
 log wj 1
• What about wj?
– Lack-of-fit term
7/18/2015
25
• 1) If wj ~ Gamma(n/2, n/2) then this is
equivalent to specifying:

e j ~ t 0, s ,n
2
ej

i.e. Student t error
Demonstrated to be
resistant to outliers
Stranden and Gianola
(1998; 1999)
• 2) If wj = 1 for all j, then

e j ~ N 0, s
7/18/2015
2
ej

Many other options!!!
See Rosa et al. (2003)
26
Now (At least) four distributional
possibilities!
• 2 × 2 factorial based on distribution
(normal versus Student t) and
homoskedastic versus heteroskedastic
residuals :
2
e
~
N
0,
s
 e
j
1. Homoskedastic normal
2
2. Homoskedastic Student t e j ~ t  0, s e ,n 
2
e
~
N
0,
s
3. Heteroskedastic normal
j
e
2
4. Heteroskedastic Student t e j ~ t 0, s e ,n

7/18/2015

j
j


27
Some results
•
•
Based on Pseudo Bayes Factors (PBF), the
Student t heteroskedastic model provided the
best data fit; the homoskedastic normal model
the worst data fit.
The heteroskedastic Student t error model was
the best fit:
– The posterior mean of the degrees of freedom
parameter (n) was 7.33 ± 0.48 indicating a heavier
tailed residual distribution than normal (n =∞) for
PWG data
7/18/2015
28
Heteroskedastic residual variance
results from
Fixed
effects
Parameter
EST.
SE
95%PPI
Gender (1)
1.13
0.09
(0.97, 1.31)
Nelore proportion (1)
1.15
0.45
(0.48, 2.20)
Heterozygosity (2)
0.70
0.16
(0.46, 1.06)
CG (sn)
0.72
0.06
(0.62,0.86)
Pr 1  1
 0.9378
Pr   2  1
 0.0449
Random
effects
Evidence of genetic homeostasis? (Lerner, 1954)
7/18/2015
29
What do these estimates mean
again?
• Example a male F1 calf in a herd (Herd 5)
with above average variability (vˆ 5  1.2 )
f1 j  0.50
– Nelore proportion
– Heterozygosity
f12j  1.00
• Estimated residual variability:
sˆ ˆ  ˆ1 
2
e 1
0.50
ˆ 2 
 sˆ 1.13 1.15 
2
e
 1.02sˆ
7/18/2015
2
e
1. 00
vˆ 5
0.50
 0.70 
1.00
1. 2
30
Posterior densities of heritabilities under
homoskedastic normal error model
Posterior density
a) Gaussian homoskedastic model
Cardoso and Tempelman, 2004
30
25
20
15
10
5
0
0
0.1
0.2
0.3
0.4
0.5
Heritability
Nelore
7/18/2015
Hereford
F1
A38
31
Posterior densities of heritabilities under
heteroskedastic normal error model
Posterior density
c) Gaussian heteroskedastic model
35
30
25
20
15
10
5
0
Some of Why
mostthe
variable
herdsfrom
were
“flip flop”
exclusively
homoskedastic
normal error?
Why the “flipHerefords
flop”
->Some of most variable herds
were exclusively Herefords
0
0.1
0.2
0.3
0.4
0.5
Heritability
Nelore
7/18/2015 Posterior
Hereford
F1
A38
densities look very similar under Student t heteroskedastic
32
Where do we go from here?
• Genetic evaluation for residual variability?
– Relevance: Uniformity of product premium.
– San Cristobal-Gaudy et al. (1998, 2001)
Sorensen and Waagepeterson (2003)


yi  xi' β  z i' u  ei ; ei ~N 0,s e2i , i  1, 2..., n,
 
ln s e2i  pi' δ  qi' v
 0   As u2
u  2 2
 v  | s u , s v , r ~ N  0 , 
 
    Ars us v
Ars us v  
2 
As v  
A: numerator relationship matrix
7/18/2015
r: genetic correlation between location and log variance effects
33
Sire EPD for litter size
variability (v)
Litter size in sheep (San Cristobal
et al., 2003)
For litter size
in pigs, a
negative rˆ
was estimated
(Sorensen and
Waagespeterson,
2003)
rˆ  corr
r (u, v)  0.19
Sire EPD for litter size (u)
7/18/2015
34
Multiple trait analysis?
• The standard for genetic evaluations today
• Perhaps genetic covariances/correlations
between traits are heterogeneous across
environments too.
• Hopefully, these issues will be investigated
further.
7/18/2015
35
References
Cardoso, F.F., and R.J. Tempelman. 2004. Hierarchical Bayes multiple-breed inference with an application to genetic
evaluation of a Nelore-Hereford population. Journal of Animal Science 82:1589-1601.
Canavesi F., Biffani S., Samore A.B., Revising the genetic evaluation for calving ease in the Italian Holstein Friesian.
Interbull Bulletin 30 (2003) 82-85 http://www-interbull.slu.se/bulletins/framesida-pub.htm.
Ducrocq V., Calving ease evaluation of French dairy bulls with a heteroskedastic threshold model with direct and
maternal effects, Interbull Bulletin 30 (2000) 82-85 http://www-interbull.slu.se/bulletins/framesida-pub.htm.
Foulley, J.L. 1997. ECM approaches to heteroskedastic mixed models with constant variance ratios. Genetics,
Selection, Evolution 29:297-315.
Foulley, J. L., M. S. Cristobal, D. Gianola, and S. Im. 1992. Marginal likelihood and Bayesian approaches to the
analysis of heterogeneous residual variances in mixed linear Gaussian models. Computational Statistics & Data
Analysis 13: 291-305.
Foulley J.L., Gianola D., Statistical analysis of ordered categorical data via a structural heteroskedastic threshold
model, Genetics Selection Evolution 28 (1996) 249-273.
Garrick, D.J., E.J. Pollak, R.L. Quaas, and L.D. Van Vleck. 1989. Variance heterogeneity in direct and maternal weight
traits by sex and percent purebred for Simmental-sired calves. Journal of Animal Science 67: 2515-2528.
Kachman, S.D. and R.W. Everett. 1993. A multiplicative model when the variances are heterogeneous. Journal of
Dairy Science 76:859-867.
Kizilkaya, K., and R.J. Tempelman. 2005. A general approach to mixed effects modeling of residual variances in
generalized linear mixed models. Genetics, Selection, Evolution (in press)
Lo, L. L., R. L. Fernando, and M. Grossman. 1993. Covariance between relatives in multibreed populations - additivemodel. Theoretical and Applied Genetics 87: 423-430.
Mark, T. 2004. Applied genetic evaluations for production and functional traits in dairy cattle. Journal of Dairy Science
87: 2641-2652.
Meuwissen, T.H.E., G. DeJong, and B. Engel. 1996. Joint estimation of breeding values and heterogeneous variances
of large data files. Journal of Dairy Science 79:310-316.
Perez-Enciso, M., and L. Varona. 2000. Quantitative Trait Loci Mapping in F2 Crosses Between Outbred Lines.
Genetics 155:391-405.
7/18/2015
36
References (cont’d)
Robinson G.K., 1991. That BLUP is a good thing - the estimation of random effects, Statistical Science 6 15-51.
Robert-Granie, C., B. Bonati, D. Boichard, and A. Barbat. 1999. Accounting for variance heterogeneity in French dairy
cattle genetic evaluation. Livestock Production Science 60: 343-357.
Robert-Granie, C. B. Heude, and J.L. Foulley. 2002. Modeling the growth curve of Maine-Anjou beef cattle using
heteroskedastic random coefficients models. Genetics, Selection, Evolution 43:423-445.
Rodriguez-Almeida, F. A., L. D. Vanvleck, L. V. Cundiff, and S. D. Kachman. 1995. Heterogeneity of variance by sire
breed, sex, and dam breed in 200-day and 365-day weights of beef-cattle from a top cross experiment. Journal of
Animal Science 73: 2579-2588.
Rosa, G. J. M., C. R. Padovani, and D. Gianola. 2003. Robust linear mixed models with normal/independent
distributions and Bayesian mcmc implementation. Biometrical Journal 45: 573-590.
San Cristobal, M., J. L. Foulley, and E. Manfredi. 1993. Inference about multiplicative heteroskedastic components of
variance in a mixed linear gaussian model with an application to beef-cattle breeding. Genetics Selection Evolution
25: 3-30.
San Cristobal-Gaudy, J.M. Elsen, L. Bodin, and C.Chevalet. 1998. Prediction of the response to a selection for
canalisation of a continuous trait in animal breeding. Genetics, Selection, Evolution 30: 423-451.
San Cristobal-Gaudy, M., Bodin, L., Elsen, J-.M., Chevalet, C. 2001. Genetic components of litter size variability in
sheep, Genetics Selection Evolution 33: 249-271
Sorensen D.A., Waagepetersen R., 2003. Normal linear models with genetically structured residual heterogeneity: a
case study. Genetical Research Cambr. 82 207-222.
Stranden, I. and D. Gianola. 1998. Attenuating effects of preferential treatment with Student t mixed linear models: A
simulation study. Genetics, Selection, Evolution 30: 565-583.
Stranden, I. and D. Gianola, 1999. Mixed effects linear models with t-distributions for quantitative genetic analysis: A
Bayesian approach. Genetics, Selection, Evolution 31:25-42.
7/18/2015
37