Transcript Slide 1

A Bayesian

 2

test for goodness of fit 10/23/09 Multilevel RIT

Overview

• Talk about basic  2 test. Review with some examples.

• Talk about the paper with examples.

Basic

 2

test

y 1 y 2 • The  2 y 3 y 4 y 5 y n test is used to test if a sample of data came from a population with a specific distribution. • An attractive feature of the  2 goodness-of-fit test is that it can be applied to any univariate distribution for which you can calculate the CDF.

The value of the  2 depends on how you partition the support.

The sample size needs to be a sufficient size for the approximation to be valid.

The  2 statistic, in the case of the simple hypothesis, is:  2

with

k

-1 degrees of freedom, as n goes to infinity

is the number of observations within the k th bin

K

is the number of partitions or bins specified over the sample space

n

is the sample size is the probability assigned by the null model to this interval

4 examples

We generate 4 sets of RVs: 1) 1000 normal 2) 3) 4) 1000 double exponential 1000 t distribution with 3 degrees of freedom 1000 lognormal We use the chi square test to see if each of the data sets fits a normal distribution. H o : the data come from a normal distribution

The  2 statistic, in the case of composite hypothesis, is:  2

with

k-s

-1 degrees of freedom, as n goes to infinity

are the estimates of the bin probabilities based on either the MLE for the grouped data or on the minimum  2 method.

Where

s

is the dimension of the underlying parameter vector

= 5.73

The MLE for the grouped data means maximizing this function with respect to

, while minimum  2 finding the value of

estimation involves that minimizes a function related to

R g

.

A Bayesian

 2

statistic.

Let y 1 , ……., y n (=

y

) denote the scalar-valued, continuous, identically distributed, conditionally independent observations drawn from the pdf f(y|

).

is indexed by an

s

-dimensional parameter vector

   

R s

We want to generate a sampled value from the posterior p(

To do that, we can apply the inverse of the probability integral transform method.

|

y

) .

Set up these integrals, and then solve for ’s

.

.

.

Generally, in practice, the are calculated using the Gibbs sampler.

Notation considerations

denotes a value of sampled from the posterior distribution based on

y

The MLE

This is interesting because if you contrast

R B

that

R ^

has

k – s

with

R ^

– 1 degrees of freedom while

R B

has we see

K

– 1 degrees of freedom.

R B

is independent of the number of parameters.

The process is:

The process is: 1) Have data y 1 , ……., y n

The process is: 1) Have data y 1 , ……., y n 2) Generate from data y 1 , ……., y n (by integral transform or Gibbs sampler).

The process is: 1) Have data y 1 , ……., y n 2) Generate from data y 1 , ……., y n (by integral transform or Gibbs sampler).

3) Create ’s

The process is: 1) Have data y 1 , ……., y n 2) Generate from data y 1 , ……., y n (by integral transform or Gibbs sampler).

3) Create ’s 4) Calculate R B

The process is: 1) Have data y 1 , ……., y n 2) Generate from data y 1 , ……., y n (by integral transform or Gibbs sampler).

3) Create ’s 4) Calculate R B 5) Repeat steps 2 to 4 to get many R B ’s

The process is: 1) Have data y 1 , ……., y n 2) Generate from data y 1 , ……., y n (by integral transform or Gibbs sampler).

3) Create ’s 4) Calculate R B 5) Repeat steps 2 to 4 to get many R B ’s 6) By LLN, 1

N i N

  1

I

[

R b

 (

a

,

b

)]   

N

 

P

(  2  (

a

,

b

))

We can then report the proportion of

R B

exceeded the 95 th values that percentile of the reference  2 with

k

-1 degrees of freedom. If the

R B

values did represent independent draws from the  2 , then the proportion of values falling in the critical region of the test would exactly equal the size of the test.

If the proportion is higher than what is expected then, the excess can be attributed to dependence between

R B

values or lack of fit.

The statistic A is used in the event that formal significance tests must be performed to assess model adequacy.

The statistic A is used in the event that formal significance tests must be performed to assess model adequacy.

A

is related to a commonly used quantity in signal detection theory and represents the area under the ROC curve [e.g., Hanley and McNeil (1982)] for comparing the joint posterior distribution of

R B

values to a

χ

2

K

−1 random variable.

The statistic A is used in the event that formal significance tests must be performed to assess model adequacy.

A

is related to a commonly used quantity in signal detection theory and represents the area under the ROC curve [e.g., Hanley and McNeil (1982)] for comparing the joint posterior distribution of

R b

values to a

χ

2

K

−1 random variable.

The expected value of

A

, if taken with respect to the joint sampling distribution of

y

and the posterior distribution of

θ

given

y

, would be 0.5. Large deviations in the expected value of

A

from 0.5, when the expectation is taken with respect to the posterior distribution of

θ

for a fixed value of

y

, indicate model lack of fit.

Some things to keep in mind

• Unfortunately, approximating the sampling distribution of A can be a lot of trouble.

Some things to keep in mind

• Unfortunately, approximating the sampling distribution of A can be a lot of trouble.

• How do you decide how many bins to make and how to assign probabilities to these bins? Consistency of tests against general alternatives requires that k   as n   .

Some things to keep in mind

• Unfortunately, approximating the sampling distribution of A can be a lot of trouble.

• How do you decide how many bins to make and how to assign probabilities to these bins? Consistency of tests against general alternatives requires that k   as n   . • Having too many bins can result in loss of power.

Some things to keep in mind

• Unfortunately, approximating the sampling distribution of A can be a lot of trouble.

• How do you decide how many bins to make and how to assign probabilities to these bins? Consistency of tests against general alternatives requires that

k

  as

n

  . • Having too many bins can result in loss of power.

• Mann and Wald suggested to use 3.8(n-1) 0.4

equiprobable cells.

Example

Let

y

= (y 1 , ….., y n ) denote a random sample from a normal distribution with unknown  and  2 Let us assume a joint prior for (  ,  2 ) to be proportional to 1/  2 .

For a given data vector

y

and posterior sample

˜

˜

)

, bin counts

m k (μ

˜

˜

)

are determined by counting the number of observations

y i

that fall into the interval

(

˜

σ



−1

(a k

−1

)

+ ˜

μ,

˜

σ

 −1

(a k )

+ ˜

μ)

, where  −1

(

·

)

denotes the standard normal quantile function. Based on these counts,

R B (μ

˜

˜

)

is calculated according to

0 2 4 x 6 8 10

0 2 4 x 6 8 10

Power Calculation

• The next figure displays the proportion of times in 10,000 draws of t samples that the test statistic A was larger than the 0.95 quantile for the sampled values of A pp.

(A pp comes from posterior predictive observations of y).

Main advantages:

Goodness-of-fit tests based on the statistic

R B

provide a simple way of assessing the adequacy of model fit in many Bayesian models.

Essentially, the only requirement for their use is that observations be conditionally independent. From a computational perspective, such statistics can be calculated in a straightforward way using output from existing MCMC algorithms.

Values of

RB

generated from a posterior distribution may prove useful both as a convergence diagnostic for MCMC algorithms and for detecting errors written in computer code to implement these algorithms.

There is a later paper written in 2007 that uses the same methodology, but applied to censored data.

Bayesian Chi-square TTE fit Using Bayesian chi-square tests to assess goodness of fit for time-to-event data

This software computes the Bayesian chi square test of Valen Johnson [1] for right-censored time-to-event data. It tests the goodness of fit of the best fit to the data from the following distribution families: exponential gamma inverse gamma Weibull log normal log logistic log odds rate

Bayesian chi square test results Input options

File sample1.txt

Number of bins 16 (default) Discrete time RNG seed Notation yes from system time 0 for alive and 1 for dead

Bayesian chi square and related statistics Distribution

Gamma LogOddsRate LogLogistic LogNormal

mean X2

11.2919

11.9972

20.9959

25.9143

Weibull Exponential 29.3764

InverseGamma 113.822

379.835

var X2 95th percentile

6.20126

15.7188

p-value bound BIC

1 9009.4

12.7518

32.4916

35.2128

18.875

31.75

37.0938

1 0.136506

0.0240434

9019.83

9027.91

9042.18

DIC

8997.49

DIC # parameters

0.973041

9002.04

1.49818

9016.12

1.03674

9030.31

0.996002

9.01371

34.6563

145.183

133.813

75.5927

397.438

0.0539539

0 0 9035 9210 9023.08

9198.14

0.97273

1.00249

9469.93

9463.99

0.493292

mean X2

is the Bayesian chi square (BCS) value, the mean of the chi-square values from 1000 samples from the posterior.

var X2

is the corresponding sample variances of the chi square values.

95 percentile

is this order statistic of the chi-square samples.

p-value bound

is the upper bound on the p-value corresponding to the order statistic using Rychlik's inequality.

BIC

is the 'Bayesian' information criteria.

DIC

is the deviance information criteria.

DIC # parameters

is the number of effective parameters as measured by the DIC.

Distribution parameters Distribution

Gamma LogOddsRate LogLogistic

param1 param2

2.97519 17.4145

2.31743 49.9121

2.73695 -10.4045

LogNormal Weibull 3.77847 0.644426

1.88126 58.0321

InverseGamma 2.18072 75.8742

Exponential 54.1108

param3

0.481747

This output produced by BCSTTE, Bayesian Chi-Square TTE fit, available at http://biostatistics.mdanderson.org/SoftwareDownload/ .

Bayesian chi square test results Input options

File Number of bins Discrete time sample2.txt

5 no RNG seed 12345 Notation 0 for uncensored and 1 for censored

Bayesian chi square and related statistics Distribution

Gamma LogLogistic LogOddsRate LogNormal Weibull InverseGamma Exponential

mean X2

4.04367

4.44592

4.58767

4.83717

5.2845

22.4472

31.9989

var X2

7.75087

11.2346

6.40555

10.6833

6.15882

86.2438

6.84955

95th percentile

8.66667

13.0833

8.91667

12.3333

9.75

37.5833

37.6667

p-value bound

1 0.213249

1 0.294848

0.879533

2.6779e-006 2.57403e-006

BIC

1075.5

1081.61

1079.92

1085.41

1075.42

1115.82

1107.98

DIC

1067.84

1074.01

1068.04

1077.74

1067.83

1108.23

1104.22

DIC # parameters

0.952195

0.987576

1.19743

0.950352

0.990863

0.99144

0.508292

Distribution parameters Distribution

Gamma LogLogistic LogOddsRate

param1 param2

2.34858 20.0585

2.3886

-8.79073

1.79345 49.7335

LogNormal Weibull 3.63348 0.753531

1.68663 52.402

InverseGamma 1.55293 42.0575

Exponential 48.4923

param3

0.134152

That’s most of it…

• Here is the math.

Thanks for coming to the talk.

Cao, Jing, Moosman, Ann, Johnson, V.E. (2008). ‘A Bayesian Chi-Squared Goodness-of-Fit Test for Censored Data Models.’ UT MD Anderson Cancer Center Department of Biostatistics Working Paper Series