Efficiency and Relative Efficiency of Tests. Chi

Download Report

Transcript Efficiency and Relative Efficiency of Tests. Chi

Efficiency and Relative Efficiency
of Tests. Chi-Square Tests
Scientific Seminar
“Asymptotic Statistics”
Olena Korzhevska
Table of Contents
• Relative Efficiency of Tests
– Asymptotic Power Functions. Consistency. Asymptotic
Relative Efficiency
• Efficiency of Tests
– Asymptotic Representation Theorem. Testing Normal
Means. Local Asymptotic Normality. One-Sample Location.
Two-Sample Problems
• Chi-Square Tests
– Quadratic Forms in Normal Vectors. Pearson Statistic.
Testing Independence. Goodness-of-Fit Tests. Asymptotic
Efficiency
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
2/42
1. Relative Efficiency of Tests
Asymptotic Power Functions
• The relative efficiency of two sequences of tests is the
quotient of the numbers of observations needed with the two
tests to obtain the same level and power.
• Testing problem:
𝐻0 : 𝜃 ∈ Θ0 𝑣𝑠. 𝐻1 : 𝜃 ∈ Θ1
• The power function of a test that rejects 𝐻0 if a test statistics
𝑇𝑛 falls into critical region 𝐾𝑛 :
𝜃 ⟼ 𝜋𝑛 𝜃 = Ρ𝜃 (𝑇𝑛 ∈ 𝐾𝑛 )
• The test is of the level 𝜶 if its size sup 𝜋𝑛 𝜃 : 𝜃 ∈ Θ0 does
not exceed 𝛼.
• The sequence of tests is asymptotically of level 𝜶 if
limsup sup 𝜋𝑛 𝜃 ≤ 𝛼.
𝑛→∞
14/02/2015
𝜃∈Θ0
Olena Korzhevska. Asymptotic Statistics
Seminar
4/42
Asymptotic Power Functions
• The test with power function 𝜋𝑛 is better than the test with
power function 𝜋𝑛′ if both
𝜋𝑛 (𝜃) ≤ 𝜋′𝑛 (𝜃), 𝜃 ∈ Θ0 ,
𝜋𝑛 (𝜃) ≥ 𝜋′𝑛 (𝜃), 𝜃 ∈ Θ1 .
• Aim: to compare tests asymptotically.
• Consider 2 sequences of tests, with power functions 𝜋𝑛 and
𝜋′𝑛 (Tests of each sequences are of the same type).
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
5/42
Asymptotic Power Functions
• First idea – compare limiting power functions of the form :
𝜋 𝜃 = lim 𝜋𝑛 𝜃 .
𝑛→∞
• Example(Sign test). 𝑋1 , 𝑋2 , … , 𝑋𝑛 r.v. form the distribution with
unique median 𝜃.
- Test: 𝐻0 : 𝜃 = 0 𝑣𝑠. 𝐻1 : 𝜃 > 0,
- Test statistics: 𝑆𝑛 = 𝑛−1 𝑛𝑖=1 1{𝑋𝑖 >0} ,
- Distribution function of the observations 𝐹 𝑥 − θ ,
𝜇 𝜃 = 1 − 𝐹 −θ ,
1 − 𝐹 −θ 𝐹 −𝜃
𝜎2 𝜃
=
,
𝑛
𝑛
- 𝑛 𝑆𝑛 − 𝜇 𝜃 ⇝ 𝑁(0, 𝜎 2 (𝜃)) asymptotically,
- under 𝐻0 : 𝑛 𝑆𝑛 − 1/2 ⇝ 𝑁(0,1/4)
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
1
𝜇 0 = ,
2
1
𝜎 0 = .
4
6/42
Asymptotic Power Functions
• Example(Sign test).
- Test that rejects 𝐻0 if 𝑛 𝑆𝑛 − 1/2 > 𝑧𝛼 /2 has power function:
𝑧𝛼
𝜋𝑛 𝜃 = 𝑃𝜃 ( 𝑛 𝑆𝑛 − 𝜇 𝜃 >
− 𝑛 𝜇 𝜃 −𝜇 0
2
𝑧𝛼
− 𝑛 𝐹 0 − 𝐹 −𝜃
2
=1− Φ
+ 𝑜(1)
𝜎 𝜃
- as 𝐹 0 − 𝐹 −𝜃 > 0 for every 𝜃 > 0, it follows that for 𝛼 = 𝛼𝑛 → 0
sufficiently slowly
0 𝑖𝑓 𝜃 = 0,
𝜋𝑛 𝜃 =
1 𝑖𝑓 𝜃 > 0.
- In this case the limit power function corresponds to the perfect test
with all error probabilities equal to zero.
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
7/42
Asymptotic Power Functions
How do we compare tests?
We need to make the problem of discriminating between the
null and the alternative hypotheses more difficult as n increases.
It is natural to consider a shrinking alternative, that converges to
the null.
To test: 𝐻0 : 𝜃 = 0 𝑣𝑠. 𝐻1 : 𝜃𝑛 > 0, with 𝜃𝑛 → 0
Example(Sign test, continued). (on a board)
In this situation a reasonable method for asymptotic comparison
of 2 sequences of tests is to consider local limiting power
functions:
𝜋 ℎ = lim 𝜋𝑛
𝑛→∞
14/02/2015
ℎ
𝑛
, ℎ ≥ 0.
Olena Korzhevska. Asymptotic Statistics
Seminar
8/42
Asymptotic Power Functions
Theorem: Suppose that 𝑇𝑛 , 𝜇, and 𝜎 are such that, for all ℎ and
𝜃𝑛 = ℎ/ 𝑛,
𝑛(𝑇𝑛 −𝜇(𝜃𝑛 ))
⇝
𝜎(𝜃𝑛 )
𝜃𝑛
𝑁(0,1)
𝜇 is differentiable in 0, 𝜎 is continuous in 0. Then the tests that
reject 𝐻0 : 𝜃 = 0 for large values of 𝑇𝑛 and are asymptotically of
level 𝛼 satisfy, for all ℎ,
ℎ
𝜇′ 0
𝜋𝑛
→ 1 − Φ 𝑧𝛼 − ℎ
.
𝜎 0
𝑛
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
9/42
Asymptotic Power Functions
Proof:
Substituting ℎ = 0 shows that the asymptotic level of the test is
𝛼 iff 𝐻0 : 𝜃 = 0 is rejected for
𝑛(𝑇𝑛 −𝜇(0))
𝜎(0)
Thus,
𝜋𝑛 𝜃𝑛 = 𝑃𝜃𝑛
= 𝑃𝜃𝑛
𝑛 𝑇𝑛 − 𝜇 0
𝑛 𝑇𝑛 −𝜇 𝜃𝑛
𝜎(𝜃𝑛 )
→ 1 − Φ 𝑧𝛼 −
14/02/2015
>
> 𝑧𝛼 .
> 𝜎 0 𝑧𝛼
𝜎 0 𝑧𝛼 − 𝑛 𝜇 𝜃𝑛 −𝜇 0
𝜎(𝜃𝑛 )
𝜇′ 0
ℎ
𝜎 0
Olena Korzhevska. Asymptotic Statistics
Seminar
10/42
Asymptotic Power Functions
•
𝜇′ 0
𝜎 0
- slope of the sequence of tests.
• Example (Sign test): The sign test has slope
• Example (t-test):𝑇𝑛 =
Reject H0 if 𝑛𝑇𝑛 > 𝑧𝛼 .
𝑛
𝑋
𝑆
−
𝜇 𝜃 =
14/02/2015
ℎ/ 𝑛
𝜎
𝜃
,𝜎
𝜎
=
𝑋
𝑆𝑛
, 𝑛
𝑛(𝑋−ℎ / 𝑛)
𝑆
𝜃 = 1.
𝜇′ 0
𝜎 0
+ℎ
𝑋−𝜃
𝑆𝑛
1
𝑆
𝜇′ 0
𝜎 0
= 2f 0 .
⇝ 𝑁 0,1 .
𝜃
1
−
𝜎
⇝ 𝑁(0,1)
ℎ/ 𝑛
= 1/𝜎.
Olena Korzhevska. Asymptotic Statistics
Seminar
11/42
Asymptotic Power Functions
Example (Sign test vs. t-test):
• 𝑋1 , 𝑋2 , … , 𝑋𝑛 random sample from a 𝑓(𝑥 − 𝜃)-density, 𝑓symmetric about 0, has unique median & finite 2𝑛𝑑 moment.
• Test: 𝐻0 : 𝜃 = 0 that the observations are symmetrically
distributes around 0. Compare the performance of sign and ttest.
• Suffices to compare the slopes of 2 tests:
1
𝜎
2f 0 and , respectively.
• For 𝑁(0,1) the slopes are 2/𝜋 and 1.
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
12/42
Asymptotic Power Functions
Relative efficiency of the sign test versus the ttest for some distributions.
DISTRIBUTION
EFFICIENCY(SIGN/T-TEST)
Logistic
𝜋 2 /12
Normal
2/𝜋
Laplace
2
Uniform
1/3
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
13/42
Consistency
• Definition: A sequence of tests with power functions 𝜃 ⟼
𝜋𝑛 𝜃 is asymptotically consistent at level 𝛼 against
alternative 𝜃 if it is asymptotically of the level 𝛼 and 𝜋𝑛 𝜃 →
1.
• If a family of sequences of tests contains for every level 𝛼 a
sequence that is consistent against every alternative, then the
corresponding tests are simply called consistent.
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
14/42
Consistency
𝑃𝜃
Lemma 1: 𝑇𝑛 a sequence of statistics: 𝑇𝑛 𝜇(𝜃) for every 𝜃. Then the
family of tests that reject the null hypothesis 𝐻0 : 𝜃 = 0 for large values of 𝑇𝑛
is consistent against every 𝜃 such that 𝜇 𝜃 > 𝜇(0).
Lemma 2: Suppose that 𝑇𝑛 , 𝜇, and 𝜎 are such that, for all ℎ and 𝜃𝑛 =
ℎ/ 𝑛,
𝑛(𝑇𝑛 −𝜇(𝜃𝑛 ))
⇝
𝜎(𝜃𝑛 )
𝜃𝑛
𝑁(0,1),
𝜇′ (0) > 0, 𝜎 – continuous at 0 and σ 0 > 0. Suppose that the tests that
reject 𝐻0 for the large values of 𝑇𝑛 have nondecreasing power functions
𝜃 ⟼ 𝜋𝑛 𝜃 . Then this family of tests is consistent against every alternative
𝜃 > 0.
Moreover, if 𝜋𝑛 0 → 𝛼, then 𝜋𝑛 𝜃𝑛 → 𝛼 when 𝑛𝜃𝑛 → 0,
or 𝜋𝑛 𝜃𝑛 → 1 when 𝑛𝜃𝑛 → ∞.
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
15/42
Consistency
• Example(t-test):
The two-sample t-statistics (𝑋𝑛 − 𝑌𝑛 )/𝑆 converges in
probability to E(𝑌 − 𝑋)/𝜎, where
𝜎 2 = lim 𝑣𝑎𝑟(𝑌𝑛 − 𝑋𝑛 ).
n→∞
If the null hypothesis postulates that E𝑌 = 𝐸𝑋, then
the test that rejects the null hypothesis for the large
values of the t-statistics is consistent against every
alternative for which E𝑌 > 𝐸𝑋.
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
16/42
Asymptotic relative efficiency
• Sequence of tests can be ranked in quality by
comparing their asymptotic power functions.
• For the test statistics we have seen so far this
comparison involves “slopes” of the tests.
• The concept of relative efficiency yields a method to
quantify the interpretation of the slopes.
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
17/42
Asymptotic relative efficiency
• Sequence of testing problems to test: 𝐻0 : 𝜃 = 0 vs. 𝐻1 : 𝜃 = 𝜃𝜐 .
• Requirement: tests need to attain asymptotic level 𝛼 and power
𝛾 ∈ (𝛼, 1).
• 𝜋𝑛 is a power function of a test if n observations are available, 𝑛𝜐 is
minimal number of observations such that both
𝜋𝑛𝜐 (0) ≤ 𝛼 and 𝜋𝑛𝜐 (𝜃𝜐 ) ≥ 𝛾.
• The limit (if exists) lim
𝑛𝜐,2
𝑛→∞ 𝑛𝜐,1
is called (asymptotic) relative efficiency
or Pitman efficiency of the first sequence of tests with respect to
second one.
• A relative efficiency larger than 1 indicates that fewer observations
are needed with the first sequence of tests, which may then be
considered the better one.
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
18/42
Asymptotic relative efficiency
Theorem: Consider stat. models (𝑃𝑛,𝜃 : 𝜃 ≥ 0) : 𝑃𝑛,𝜃 − 𝑃𝑛,0
𝑛 𝑇𝑛,𝑖 −𝜇𝑖 𝜃𝑛
Let 𝑇𝑛,1 , 𝑇𝑛,2 – sequences of statistics:
𝜎𝑖 𝜃𝑛
𝜃→0
0, ∀𝑛.
⇝ 𝑁 0,1 , ∀𝜃𝑛 → 0,
𝜃𝑛
functions: 𝜇𝑖 − differentiable at 0, 𝜇′ 𝑖 0 > 0, and 𝜎𝑖 −continuous at 0,
𝜎𝑖 0 > 0, i ∈ 1,2 . Then the relative efficiency of the tests that reject
𝐻0 : 𝜃 = 0 for large values of 𝑇𝑛,𝑖 is equal to
𝜇1′ (0)/𝜎1 (0)
𝜇2′ (0)/𝜎2 (0)
2
, ∀ 𝜃𝜈 ↓ 0,
∀𝜃𝜐 → 0 independently of 𝛼 > 0 and 𝛾 ∈ 𝛼, 1 .
If the power function of the test based on 𝑇𝑛,𝑖 are nondecreasing for every
n, then the assumption of asymptotic normality of 𝑇𝑛,𝑖 can be relaxed to
asymptotic normality under every sequence 𝜃𝑛 = 𝑂(1/ 𝑛) only.
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
19/42
2. Efficiency of Tests
Asymptotic Representation Theorem
• Randomized test (test function) 𝜙 in an experiment
(𝜒, 𝐴, 𝑃ℎ : ℎ ∈ 𝐻) is a measurable map 𝜙: 𝜒 ⟼ [0,1] on the
sample space.
• The power function of a test 𝜙 is the function ℎ ⟼ 𝜋 ℎ =
𝐸ℎ 𝜙 𝑋 .
Theorem: Let the sequence of experiments ℰ𝑛 = (𝑃𝑛,ℎ : ℎ ∈ 𝐻)
converge to a dominated experiment ℰ= (𝑃ℎ : ℎ ∈ 𝐻). Suppose
that a sequence of power functions 𝜋𝑛 of tests in ℰ𝑛 converges
poinwise, i.e., 𝜋𝑛 ℎ → 𝜋(ℎ), for every h and some arbitrary
function 𝜋. Then 𝜋 is a power function in the limit experiment,
i.e., there exists a test 𝜙 in ℰ with 𝜋 ℎ = 𝐸ℎ 𝜙 𝑋 for every h.
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
21/42
Testing Normal Means
• Suppose X is 𝑁𝑘 (ℎ, Σ)-distributed, Σ – known, h –
unknown.
• Test: 𝐻0 : 𝑐 𝑇 ℎ = 0 vs. 𝐻1 : 𝑐 𝑇 ℎ > 0, for known vector
c, 𝑐 𝑇 Σ𝑐 > 0
Proposition: The test that rejects 𝐻0 if 𝑐 𝑇 𝑋 > 𝑧𝛼 𝑐 𝑇 Σ𝑐
is uniformly most powerful at level 𝛼 for testing the
𝐻0 : 𝑐 𝑇 ℎ = 0 vs. 𝐻1 : 𝑐 𝑇 ℎ > 0, based on X.
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
22/42
Local Asymptotic Normality
• If the model (𝑃𝜃 : 𝜃 ∈ Θ) is differentiable in quadratic mean, then
the local experiment converges to the Gaussian experiment (recall
yesterday last talk!)
𝑃𝜃𝑛0 +ℎ/
−1
𝑘
𝑘
:
ℎ
∈
𝑅
→
𝑁
ℎ,
𝐼
:
ℎ
∈
𝑅
𝜃
𝑛
0
• The sequence of power functions 𝜃 ↦ 𝜋𝑛 (𝜃) in original
experiments induces the sequence of power functions h ↦
𝜋𝑛 (𝜃0 + ℎ/ 𝑛) in the local experiments. Suppose 𝜋𝑛 𝜃0 +
ℎ
𝑛
→
𝜋 ℎ ∀ℎ, some 𝜋. Then by the asymptotic representation theorem,
this limit 𝜋 is the power function in the Gaussian limit experiment.
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
23/42
Local Asymptotic Normality
• Suppose 𝜃-real, 𝜋𝑛 is of asymptotic level 𝛼 to test:
𝐻0 : 𝜃 ≤ 𝜃0 vs. 𝐻1 : 𝜃 > 𝜃0
Then, 𝜋 0 = lim 𝜋𝑛 𝜃0 ≤ 𝛼, and hence 𝜋 corresponds to a level 𝛼
𝑛→∞
test for:
𝐻0 : ℎ = 0 vs. 𝐻1 : ℎ > 0
in the limit experiment.
• By Proposition for testing normal means, 𝜋 must be bounded by the
power function of the uniformly most powerful level 𝛼 test in the
limit experiment. Thus ∀h,(c=1, Σ = 𝐼𝜃−1
𝑖𝑛 𝑃𝑟𝑜𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛)
0
ℎ
lim 𝜋𝑛 𝜃0 +
≤ 1 − Φ 𝑧𝛼 − ℎ 𝐼𝜃0
𝑛→∞
𝑛
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
24/42
Local Asymptotic Normality
• As stated earlier, sequence of power function
𝜋𝑛 𝜃0 + ℎ/ 𝑛 → 1 − Φ(𝑧𝛼 − ℎ𝑠)
for every h, has slope s. From the upper bound, 𝐼𝜃0 is the largest
possible slope.
• The relative efficiency of the best test and the test with a slope s is:
𝐼𝜃0 /𝑠 2
which can be interpreted as the number of observations needed with
the given sequence of tests with the slope s divided by the number of
observations needed with the best test to obtain the same power.
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
25/42
Local Asymptotic Normality
Theorem 15.4:
Let Θ ⊂ 𝑅𝑘 -open, 𝜓: Θ ⟼ 𝑅-differentiable in 𝜃0 , with 𝜓 ≠
0: 𝜓 𝜃0 = 0. Let (𝑃𝑛 ,𝜃 : 𝜃 ∈ Θ) be locally asymptotically normal
at 𝜃0 with nonsingular I, 𝑟𝑛 → ∞ -const.
Then, 𝜃 ↦ 𝜋𝑛 (𝜃) of any sequence of level 𝛼 tests for testing:
𝐻0 : 𝜓(𝜃) ≤ 0 vs. 𝐻1 : 𝜓(𝜃) > 0 satisfy for every h: 𝜓𝜃0 ℎ > 0:
𝜓𝜃0 ℎ
ℎ
limsup𝜋𝑛 𝜃0 +
≤ 1 − Φ 𝑧𝛼 −
.
𝑟𝑛
𝑛→∞
𝑇
𝜓𝜃0 𝐼𝜃−1
𝜓
𝜃0
0
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
26/42
Local Asymptotic Normality
Addendum:
Let 𝑇𝑛 be statistics such that
𝜓𝜃0 𝐼𝜃−1
Δ
0 𝑛,𝜃0
𝑇𝑛 =
+ 𝑜𝑃𝑛,𝜃 1 .
0
−1 𝑇
𝜓𝜃0 𝐼𝜃0 𝜓𝜃0
Then the sequence of tests that reject 𝐻0 for the values of 𝑇𝑛 > z𝛼 is
asymptotically optimal in the sense that the sequence for every h
𝑃𝜃0 +𝑟𝑛−1 ℎ 𝑇𝑛 ≥ 𝑧𝛼 → 1 − Φ 𝑧𝛼 −
𝜓𝜃0 ℎ
𝑇
𝜓𝜃0 𝐼𝜃−1 𝜓𝜃
0
0
*(Δ𝑛,𝜃0 - sequence of statistics that converges in distribution under 𝜃0
to a normal 𝑁𝑘 (0, 𝐼𝜃0 )-distribution).
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
27/42
Local Asymptotic Normality
• The point 𝜃0 in the theorem is on the boundary of 𝐻0 and 𝐻1 .
• If the dimension k>1, then this boundary is (k-1)-dimentional,
and there are many possible values for 𝜃0 .
• If dimension k=1, the boundary point 𝜃0 is typically unique
−1/2
and hence known, and we could use Tn = I𝜃0 Δ𝑛,𝜃0 to
construct an optimal sequence of tests for the problem
𝐻0 : 𝜃 = 𝜃0 .There are known as score tests.
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
28/42
One-Sample Location
• 𝑋1 , 𝑋2 , … , 𝑋𝑛 sample from a 𝑓(𝑥 − 𝜃)-density, 𝑓-symmetric about
0, has finite 𝐼𝑓 , may be known or (partially) unknown.
• To test: 𝐻0 : 𝜃 = 0 vs. 𝐻1 : 𝜃 > 0.
• For fixed 𝑓, ( 𝑛𝑖=1 𝑓 𝑥𝑖 − 𝜃 : 𝜃 ∈ R ) is locally asymptotically
normal at 𝜃 = 0 with Δ𝑛,0 = −𝑛−1/2 𝑛𝑖=1 𝑓/𝑓′ (𝑋𝑖 ), norming rate
𝑛, Fisher information 𝐼𝑓 .
• From the preceding sections, the best asymptotic level 𝛼 power
function for known 𝑓 is 1 − Φ 𝑧𝛼 − ℎ 𝐼𝑓 .
• 𝑇𝑛 = −
1 1
𝑛 𝐼𝑓
′
𝑛 𝑓
𝑖=1 𝑓
𝑋𝑖 + 𝑜𝑃0 (1)
• Than according to the Theorem 15.4, the sequence of tests that
reject 𝐻0 if 𝑇𝑛 > 𝑧𝛼 attains bound and hence is asymptotically
optimal.
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
29/42
One-Sample Location
Example(t-test):
The standard normal density 𝑓0 possesses score function
𝑓0′ /𝑓0 𝑥 = −𝑥 and I𝑓0 = 1. Consequently, if the underlying
distribution is normal, then the optimal test statistics
should satisfy: Tn = 𝑛𝑋𝑛 /𝜎 + 𝑜𝑃0 (𝑛−1/2 ).
The t-statistics 𝑋𝑛 /𝑆𝑛∗ fulfill the requirements. That is the
case because for normally distributed observations the ttest is uniformly most powerful for every finite n and
hence is certainly asymptotically optimal.
*t-statistics simply replaces unknown standard deviation 𝜎 by an estimate 𝑆𝑛
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
30/42
One-Sample Location
In this example, t-statistics simply replaces the unknown
standard deviation 𝜎 by an estimate. This approach can be
followed for the most scale families. Under some regularity
conditions, the statistics
𝑛
1 1
𝑓0′ 𝑋𝑖
𝑇𝑛 = −
𝑓0 𝜎𝑛
𝑛 𝐼𝑓0
𝑖=1
Should yield asymptotically optimal tests, given a consistent
sequence of scale estimators 𝜎𝑛 .
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
31/42
3. Chi-Square Tests
Quadratic Forms in Normal Vectors
• 𝜒𝑘2 ≝
•
𝑘
2
𝑍
𝑖=1 𝑖
𝑘
2
𝑍
𝑖=1 𝑖
for i.i.d. 𝑁 0,1 -distributed 𝑍1 , 𝑍2 , … , 𝑍𝑘
≝ 𝑍
2
of standard normal vector 𝑍 = (𝑍1 , … , 𝑍𝑘 )
Lemma: If vector 𝑋 is 𝑁𝑘 (0, Σ)-distributed, then 𝑋 2 is
distributed as 𝑘𝑖=1 𝜆2𝑖 𝑍𝑖2 for i.i.d. 𝑁 0,1 -distributed 𝑍1 , … , 𝑍𝑘
and 𝜆1 , … , 𝜆𝑘 the eigenvalues of Σ.
Proof: There exists an orthogonal matrix 𝑂 : 𝑂Σ𝑂𝑇 = 𝑑𝑖𝑎𝑔(𝜆𝑖 ).
Then the vector 𝑂𝑋~𝑁𝑘 (0, 𝑑𝑖𝑎𝑔(𝜆𝑖 )), which is the same as the
distribution of the vector ( 𝜆1 𝑍1 , … , 𝜆𝑘 𝑍𝑘 ). Now 𝑋 2 =
𝑂𝑋
2
14/02/2015
has the same distribution as
𝑘
𝑖=1
Olena Korzhevska. Asymptotic Statistics
Seminar
2
𝜆𝑖 𝑍𝑖 .
33/42
Pearson Statistics
• Suppose we observe 𝑋𝑛 = (𝑋𝑛,1 , … , 𝑋𝑛,𝑘 ) with multinomial
distribution corresponding to 𝒏 trials and 𝒌 classes having
probabilities 𝑝 = (𝑝1 , … , 𝑝𝑘 ).
• The Pearson statistics for the testing 𝐻0 : 𝑝 = 𝑎 is given by
𝑘
𝐶𝑛 a =
𝑖=1
𝑋𝑛,𝑖 − 𝑛𝑎𝑖
𝑛𝑎𝑖
2
Theorem: If the vector 𝑋𝑛 is multinomially distributed with the
parameters 𝑛 and 𝑎 = 𝑎1 , … , 𝑎𝑘 > 0, then the sequence
𝑃
2
𝐶𝑛 a → 𝜒𝑘−1
under 𝑎.
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
34/42
Pearson Statistics
• The Pearson statistic is oddly asymetric in the observed and
true frequencies(which is motivated be the form of the
asymptotic covariance matrix).
• The method to symmetrize the statistic leads to the Hellinger
statistic
𝑘
𝐻𝑛2
a =4
𝑖=1
𝑋𝑛,𝑖 − 𝑛𝑎𝑖
𝑘
2
𝑋𝑛,𝑖 + 𝑛𝑎𝑖
2
=4
𝑋𝑛,𝑖 − 𝑛𝑎𝑖
2
𝑖=1
• Up to a multiplicative constant it’s a Hellinger distance
between the discrete probability distribution on {1, … , 𝑘} with
probability vectors 𝑎 and 𝑋𝑛 /𝑛, respectively.
𝑃
• As (𝑋𝑛 /𝑛 − 𝑎) → 0, 𝐻𝑛2 is asymptotically equivalent to 𝐶𝑛 .
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
35/42
Testing Independence
• Suppose that each element of a population can be classified
by two characteristics, having 𝒌 and 𝒓 levels, respectively :
𝑁11 ⋯ 𝑁1𝑟
𝑁1 .
⋮
⋱
⋮
⋮
𝑁𝑘1 ⋯ 𝑁𝑘𝑟 𝑁𝑘 .
…………………………
𝑁.1 … 𝑁.𝑟
𝑁
• Classification for a random sample of size 𝒏 from the
population – matrix 𝑋𝑛,𝑖𝑗 : multinomially distributed with
parameters 𝒏 and probabilities 𝑝𝑖𝑗 = 𝑁𝑖𝑗 /𝑁.
• 𝐻0 : 𝑝𝑖𝑗 = 𝑎𝑖 𝑏𝑗 − 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑒𝑠 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 for unknown
probability vectors 𝑎𝑖 and 𝑏𝑗 .
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
36/42
Testing Independence
• The ML-estimators of 𝑎 and 𝑏 under 𝐻0 :
𝑎𝑖 = 𝑋𝑛,𝑖. /𝑛 and 𝑏𝑗 = 𝑋𝑛,.𝑗 /𝑛
• Modified Pearson statistic with these estimators:
𝑘
𝑟
𝐶𝑛 𝑎𝑛 ⨂𝑏𝑛 =
𝑖=1 𝑗=1
𝑋𝑛,𝑖𝑗 − 𝑛𝑎𝑖 𝑏𝑗
2
𝑛𝑎𝑖 𝑏𝑗
Corollary: If the (𝑘 × 𝑟) matrices 𝑋𝑛 are multinomially
distributed with parameters 𝑛 and 𝑝𝑖𝑗 = 𝑎𝑖 𝑏𝑗 > 0, then the
sequence 𝐶𝑛 𝑎𝑛 ⨂𝑏𝑛 converges in distribution to the
2
𝜒(𝑘−1)(𝑟−1)
-distribution.
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
37/42
Testing Independence
Example: Google wants to test the performance of new search
algorithms. Google might test three algorithms using a sample of
10,000 google.com search queries.
Search algorithm
No new search
New search
Total
current
test 1
test 2
3511
1749
1818
1489
751
682
5000
2500
2500
Total
7078
2922
10000
• To test:
𝐻0 : The algorithms each perform equally well.
𝐻1 : The algorithms do not perform equally well.
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
38/42
Testing Independence
Example: ML estimators for 𝑎 and 𝑏: 𝑎𝑖 = 𝑋𝑛,𝑖. /𝑛, 𝑏𝑗 = 𝑋𝑛,.𝑗 /𝑛,
• 𝑛𝑎𝑖 𝑏𝑗 = 𝑋𝑛,𝑖. ∙ 𝑋𝑛,.𝑗 /𝑛 – expected count of each cell (ij).
Search algorithm
current
test 1
test 2
No new search 3511 (3539) 1749 (1769.5) 1818 (1769.5)
New search
1489 (1461) 751 (730.5)
682 (730.5)
Total
5000
2500
2500
• 𝐶𝑛 𝑎𝑛 ⨂𝑏𝑛 =
𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑐𝑜𝑢𝑛𝑡−𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡 2
𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡
Total
7078
2922
10000
= 6.120
• 𝑑𝑓 = 𝑘 − 1 𝑟 − 1 = 2 − 1 3 − 1 = 2
• p−value = 0.047, thus we reject 𝐻0 at significance level 𝛼 = 0.05.
That is, the data provide convincing evidence that there is some
difference in performance among the algorithms.
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
39/42
Goodness-of-Fit Tests
• Given a random sample 𝑋1 , 𝑋2 , … , 𝑋𝑛 from a distribution 𝑃,
we want to test H0 : 𝑃 ∈ 𝒫0
• Testing goodness-of-fit typically focuses on no particular
alternative, that is why 𝜒 2 statistics are reasonable.
• Partition 𝑋 =∪𝑗 𝑋𝑗 of the sample space into finitely many sets
• ℙ𝑛 𝐴 = 𝑛−1 (1 ≤ 𝑖 ≤ 𝑛: 𝑋𝑖 ∈ 𝐴) fraction of observations in
𝐴
• Vector 𝑛(ℙ𝑛 𝑋1 , … , ℙ𝑛 𝑋𝑘 ) is multinominal distributes,
modified chi-squared statistics is given:
𝑘
𝑛 ℙ𝑛 𝑋𝑖 − 𝑃 𝑋𝑖
𝑖=1
14/02/2015
2
𝑃 𝑋𝑖
Olena Korzhevska. Asymptotic Statistics
Seminar
40/42
Asymptotic Efficiency
• The asymptotic null distributions of various versions of the
Pearson statistic enable us to set critical values but by
themselves do not give information on the asymptotic power
of the tests.
• The asymptotic power can be measured in various ways:
– the most important method – to consider local limiting
power functions (discussed earlier)
– A second method to evaluate the asymptotic power is by
Bahadur efficiencies
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar
41/42
Thank you for attention.
14/02/2015
Olena Korzhevska. Asymptotic Statistics
Seminar