Document 7699178

Download Report

Transcript Document 7699178

Computer algebra and
rank statistics
Alessandro Di Bucchianico
HCM Workshop Coimbra
November 5, 1997
How to run this presentation?
•
•
•
•
the presentation runs itself most of the time
click the mouse if you want to continue
type S to stop or restart the presentation
underlined items are hyperlinks to files on
the World Wide Web (usually Postscripts
files of technical reports)
Enjoy my presentation!
2
Outline
•
•
•
•
•
•
General remarks on nonparametric methods
What is computer algebra?
Case study: the Mann-Whitney statistic
Critical values of rank test statistics
Moments of the Mann-Whitney statistic
Conclusions
3
General remarks on
nonparametric methods
Practical problems
• tables (limited, errors, not exact,…)
• limited availability in statistical software
• procedures in statistical software often only
based on asymptotics
4
General remarks on
nonparametric methods
Mathematical problems
• in general no closed expression for
distribution function
• direct enumeration only feasible for small
sample sizes
• recurrences are time-consuming
5
What is computer algebra?
A
HLI M
E
@
ђH
L
8
<
8
<
D
HL B@
DHLђF
Sample session in Mathematica 3.0
Expand
1+ x
2
1+ x
1 + x + x2 + x3
<< "DiscreteMath`RSolve`"
ik 8 <
y
{
1
S e ri e s T e rm
1-
,
x4
x, 0 , n
ђ@D
BH
LH
L
FH
L@D
@D
If E v e n n ,
1
-
1
2
1
4
If n
і
0, 1, 0
n 2
If n
і
0, 1, 0 , 0
1
+
-
4
1
n
If n
і
0, 1, 0
+
SeriesTerm 1
Assumptions ->
1
4
-
+
1
4
1 - x ^4 ,
n і 0
n
+
x, 0, n ,
If Even n ,
1
2
-
1
n 2
,0
6
Case study: Mann-Whitney
statistic
independent samples X1,…,Xm and Y1,…,Yn
continuous distribution functions F, G resp.
(hence, no ties with probability one)
order the pooled sample from small to large
7
Mann-Whitney (continued)
Wilcoxon: Wm,n= Si rank(Xi)
Mann-Whitney: Mm,n = #{(i,j) | Yj < Xi}
Wm,n = Mm,n + ½ m (m+1)
What is the distribution of Mm,n under H0:F=G?
8
Under H0, we have:
mn
mn
 P ( M m ,n
k 0
1
 k) x 
m  n


 n 
i
(
1

x
)

k
i  n 1
m
j
(
1

x
)

j 1
Ы
H
L
AA@
E
E
DЫ
HL
@
D
:
>
Mann-Whitney generating function in Mathematica 3.0
1
Expand Simplify
1
10
x2
x
+
10
+
5
Binomial 5, 3
x3
+
5
x4
+
5
x5
+
10
5
i= 3
3
j= 1
1 - xi
1 - xj
x6
+
10
CoefficientList %, x
1
10
,
1
10
,
1
5
,
1
5
,
1
5
,
1
10
,
1
10
9
@
D
A
8
<
A
A
H
L
AAђ @DЫ
E
E
E
E
E
Ы
H
L
@
D
@
@
D
D
@
D
@
8@ @
@
@
D
D
D
<
@
@D
D
D
D
MannWhitneyCumFreq m_, n_
:= Module
x, i, j ,
FoldList Plus, 0, CoefficientList
Expand Factor 1 Binomial m + n, n *
MannWhitneyLeftSigValue m_, n_, k_
MannWhitneyLeftCritValue m_, n_, a _
m+ n
i= n+ 1
m
j= 1
1 - xi
1 - xj
,x
:= MannWhitneyCumFreq
:= Module
value = Length Select MannWhitneyCumFreq m, n , # Ј a &
If NonNegative N value
k+ 2
- 2 ,
, value, "no critical value exists"
10
Computational speed (Pentium 133 MHz)
Exact: P(M5,5  4) = 1/21  0.0476
computing time: 0.05 sec (generating function: degree 25)
P(M5,5  4)  0.0384
Exact: P(M20,20  138) = 0.0482 (rounded)
computing time: 8.5 sec (generating function: degree 400)
P(M20,20  138)  0.0475
Asymptotics and exact calculations are both useful!
11
Other examples of nonparametric test statistics
with closed form for generating function include:
•
•
•
•
•
Wilcoxon signed rank statistic
Kendall rank correlation statistic
Kolmogorov one-sample statistic
Smirnov two-sample statistic
Jonckheere-Terpstra statistic
Consult the combinatorial literature!
What to do if there is no generating function?
12
Linear rank statistics
Tm , n 
mn
 a ( ) Z

 1
Z = 1 if th order statistic in the pooled sample is an X-observation, and 0 otherwise
Streitberg & Röhmel 1986 (cf. Euler 1748):
N 
k m
a (1)
a( N )
Pr(
T
m
,
n

k
)
x
y

(
1

x
y
)
...
(
1

x
y)




m 
mn  N k 

Branch-and-bound algorithm (Van de Wiel)
13
Moments of Mann-Whitney statistic
Mann and Whitney (1947) calculated 4th central moment
Fix and Hodges (1955) calculated 6th central moment
Computations are based on recurrences
Can we improve?
solution: computer
algebra and generating functions
14
Computing moments of Mm,n
E X (n)
 dn
  n
 dx

k P( X  k ) x 
 x 1
k
recompute E(Mm,n) (following René Swarttouw)
n 1
nm
(1  x )...(1  x )
Gm ,n ( x ) :
m
(1  x )...(1  x )
m
m
k 1
k 1
log Gm ,n ( x )   log( 1  x n  k )   log( 1  x k )
15
m
m
d
k x k 1
( n  k ) x n  k 1
log Gm ,n ( x )  

k
n k
dx
1

x
1

x
k 1
k 1
d
G m ,n ( x )
d
dx

log Gm ,n ( x ) 
G m ,n ( x )
dx
k x k 1 (1  x n  k )  ( n  k ) x n  k 1 (1  x k )

k
nk
(
1

x
)
(
1

x
)
k 1
m
Gm ,n is a polynomial , hence G ' m ,n (1)  lim G ' m ,n ( x )
x 1
n  m
Fact : lim Gm ,n ( x )  
 .
x 1
 n 
16
Hence, it remains to calculate for 1  k  m :
lim
x 1
kx
k 1
nk
n  k 1
(1  x )  ( n  k ) x
k
nk
(1  x ) (1  x )
(1  x )
k
k (1  x n )  n x n (1  x k )
lim
k
n k
x 1
(1  x ) (1  x )
After some simplifications:
n 1
k (1  ...  x )  n (1  ...  x
lim
x 1
k (n  k ) (1  x )
n  k 1
)
17

L’Hôpital’s rule yields that the limit equals:
1  1
1
 n
 kn( n  1)  n( kn  k ( k  1)  
k (n  k )  2
2
 2
It is tedious to perform these computations by hand.
Alternative:
compute moments using Mathematica.
18
Mathematica procedures for moments of Mm,n:
A
@D
A
E
A
E
@
D
@
8
<
@@
@@@
@
D
8
<
D
D
D
D
8 <D
D@
D
@
8
<
@
@
@
@
@
@
D
D
8
<
D
D
@
D
D
@
D
8
<
D
@
@
@
@
@
D
8
<
D
8
<
D
@ @
@
@
D8<D8 <D
D
D
D
D
Log G k_, n_, x_
:= Log 1 - x n+ k - Log 1 - x k
DerivativeOfLogG r_
:= Module
j, der ,
Sum Simplify r! Coefficient
Normal Series LogG k, n, x , x, 1, r + 1
, x - 1, r
,
k, 1, m
FactorialMoments r_
:= Module
i, j, equations ,
equations = Table
ReplaceAll Simplify Together D, Log G z
, z, j
G z ® 1 == DerivativeOfLogG j , j, 1, r
,
;
Flatten ReplaceAll Table D G z , z, i
, i, 1, r
Solve equations, Table D G z , z, i
, i, 1, r
,
19
I H LI
8th central moment of Mm,n
1
m n 1+ m + n
2
3
4
- 96 m + 96 m + 240 m - 240 m -
34560
432 m 5 - 144 m 6 - 96 n + 192 m n + 224 m 2 n - 540 m 3 n 100 m 4 n + 780 m 5 n + 404 m 6 n + 96 n2 + 224 m n2 - 600 m 2 n2 200 m 3 n2 + 900 m 4 n2 - 48 m 5 n2 - 420 m 6 n2 + 240 n3 - 540 m n3 200 m 2 n3 + 1095 m 3 n3 - 395 m 4 n3 - 735 m 5 n3 + 175 m 6 n3 240 n4 - 100 m n4 + 900 m 2 n4 - 395 m 3 n4 - 630 m 4 n4 + 525 m 5 n4 -
M
432 n5 + 780 m n5 - 48 m 2 n5 - 735 m 3 n5 + 525 m 4 n5 - 144 n6 +
404 m n6 - 420 m 2 n6 + 175 m 3 n6
20
Conclusions
• generating functions are also useful in
nonparametric statistics
• computer algebra is a natural tool for
mathematicians
• asymptotics and exact calculations
complement each other
21
Topics under investigation
• tests for censored data
• power calculations
• nonparametric ANOVA (Kruskal-Wallis, block
designs, multiple comparisons)
• Spearman’s  (rank correlation)
• multimedia/ World Wide Web implementation
Click on underlined items to obtain Postscript file of technical report
22
References
• A. Di Bucchianico, Combinatorics,
computer algebra and the Wilcoxon-MannWhitney test, to appear in J. Stat. Plann. Inf.
• B. Streitberg and J. Röhmel, Exact
distributions for permutation and rank tests:
An introduction to some recently published
algorithms, Stat. Software Newsletter 12
(1986), 10-18
23
References (continued)
• M.A. van de Wiel, Exact distributions of
nonparametric statistics using computer
algebra, Master’s Thesis, TUE, 1996
• M.A. van de Wiel and A. Di Bucchianico,
The exact distribution of Spearman’s rho,
technical report
• M.A. van de Wiel, A. Di Bucchianico and P.
van der Laan, Exact distributions of
nonparametric test statistics using computer
24
algebra, technical report
References (continued)
• M.A. van de Wiel, Edgeworth expansions
with exact cumulants for two-sample linear
rank statistics , technical report
• M.A. van de Wiel, Exact distributions of
two-sample rank statistics and block rank
statistics using computer algebra , technical
report
25
The End
26