Diapositiva 1 - Bicocca Applied Statistics Center

Download Report

Transcript Diapositiva 1 - Bicocca Applied Statistics Center

Singly and doubly ordered
cumulative correspondence
analysis.
L. D’Ambra*, E. Beh** and I. Camminatiello*
*University of Naples Federico II (Italy)
[email protected], [email protected]
** University of Newcastle (Australia)
[email protected]
Outline
 A short review
 Singly ordered cumulative correspondence analysis: methodology
and application in industrial experiment
 Doubly ordered cumulative correspondence analysis: some
developments and application with Van Rijckevorsel’s data and
Monitoring Automobile Pollution Emissions (ARTEMIS Project)
 We also propose an unified approach
Review (1/3)
 In multidimensional data analysis for studying the
association between two categorical variables,
Correspondence Analysis (CA) is one of the most
popular tool.
 This method is based on chi-squared ( phi-squared)
 Drawback
It does not take in consideration the ordered
nature of the categories
Review (2/3)
 There are some contributions that deal with ordinal
categorical variables, including those of Parsa and Smith
(1993), Ritov and Gilula (1993) and Schriever (1983)
 These procedures involve constraining the output
obtained from applying singular value decomposition
(SVD) so that the coordinates in the first dimension have
an ordered structure.
 An alternative approach applies moment decomposition
(MD - Beh, 1997) or hybrid decomposition (HD - Beh,
2004) that involve using the orthogonal polyinomials in
order to detect linear , quadratic , cubic components
Review (3/3)
 In some industrial experiments, sometimes the output consists of
categorical data (contingency table ) with an ordering in the
categories.
 For analyzing such data, Taguchi (1966, 1974) proposed the
Accumulation Analysis method as an alternative to Pearson's chisquared test.
 His motivation for recommending this technique appears to be its
similarity to ANOVA for quantitative variables.
 More recently, Light and Margolin (1971) proposed a method called
CATANOVA by defining an appropriate measure of variation for
categorical data.
 Unlike these methods Taguchi considers situations with ordered
categories and does ANOVA on the cumulative frequencies
Aim of our paper
 In this paper we explore the development of
correspondence analysis which takes into
account the presence of ordered variables by
considering the cumulative sum of cell
frequencies across the variables.
Singly ordered cumulative correspondence
analysis
 Beh, D’Ambra, Simonetti (Carme Rotterdam 2007;
Communication in Statistics 2011) performed
correspondence analysis when cross-classified variables
have an ordered structure by considering the Taguchi’s
statistic.
 Taguchi’s statistic is an appropriate measure of nonsymmetric association for two categorical variables of
which one is on ordinal scale.
 It takes into account the presence of an ordered variable
by considering the cumulative sum of cell frequencies
across the variable.
Notation (1/3)
 the absolute two-way contingency
table that cross-classifies n units
according to I ordered row categories
and J ordered column categories
 the relative two-way contingency table
 the row and column marginals.
N  nij 
1
Pn N
n j ni ,
 A triangular matrix of 1’s with the last J-th row
is removed so that it is of dimension (J -1)×J.
A triangular matrix of 1’s of dimension J x J
A triangular matrix of 1’s of dimension I x I
MJ
M
L
Notation (2/3)
 the vectors with the marginal frequencies of P
r
and
c
 the diagonal matrices with the marginal
frequencies of P
Dr
and
Dc
 the cumulative frequencies
zi1  ni1, zi 2  ni1  ni 2 , , ziJ  ni1   niJ
 the cumulative column proportions
n1
d1 
,
n
n1  n2
d2 
,
n
,
n1    nJ
dJ 
n
Taguchi’s statistic (1/2)
Taguchi (1966) proposed the following statistic
Gives more weight to the latter categories
2
 I

z


T   w j   ni  ij  d j  
 i1  ni

j 1



w1 ,, wJ 1 are weights >0. Two choices are
1




w

d
1

d
possible j
or w j  1 J
j
j
J 1
wj  1 ( J  j)
It gives more weight to the latter categories
Taguchi’s statistic (2/2).
 The properties of T, Taguchi'(1966, 1974)
"cumulative-sums' statistic obtained by assigning a
weight to each column that is inversely proportional
to its conditional expectation of the j-th term
(conditional on the given marginals)
In this paper we use this weighting system .
w j  d j 1  d j  ........j  1....J  1
1
 A simpler statistic, T, which assigns each column
constant weights 1/J or 1/(J-j)
The Pearson chi-squared statistic and
Taguchi’s statistic
Nair (1987) demonstrated that the link between the Pearson
chi-squared statistic and Taguchi’s statistic is
J 1
2
j
j 1
T  

2
j is the Pearson chi-squared statistic for the contingency
table obtained by aggregating column categories 1 to j, and
aggregating the column categories j+1 to J.
For this reason, it is also referred to as cumulative chisquared statistic.
Taguchi’s statistic in matrix notation (1/2)
The Taguchi’s statistic may be expressed in
matrix notation by

1 2
r
T  n  trace D
T
T
1 2
r
NA WAN D

W (J-1,J-1) is the diagonal matrix of weights
A (J-1,J) is the matrix involving the cumulative column proportions
 d1
  d1
 1  d1
 1 d
1  d2   d2
2

A
 




1  d J 1 1  d J 1  1  d J 1
 d1 
 d 2 
 

 d J 1 
Taguchi’s statistic in matrix notation (2/2)
Considering that
A  M  J  d1
d  MJ c
T
The Taguchi’s statistic after some algebra may be rewritten by




NM  nrd WNM  nrd  D 
nPM  nrd WnPM  nrd  D
P  rc M WM P  rc  D 
D P  1 c M WM P  rc  D 
1 2
r
1 2
r
1 2
r
12
r
n  trace D
n  trace D
T
n  trace D
n  trace D
T
J
T
J
T
1
r
T
T
J
T
T
J
T
r
T  n  trace D
1 2
r
J
T
J
J
T T
1 2
r
T
T T
1 2
J
r
T T
1 2
r
T T
12
r
P  rc D P  rc  D
T
1
c
T T
1 2
r


(C.A.)
Approach proposed
 Beh, D’Ambra, Simonetti (Carme 2007) and Communication in
Statistics 2011) carried out CA when cross-classified variables
have an ordered structure by considering the Taguchi’s statistic.
 In terms of the Taguchi's statistic, Beh et al. (2010) perform SVD
on
X
1 2
r
12
r
D
D
P  rc M
D
T
T
J
12
W
P  1r c M W
1
r
T
T
J
Matrix X is centered
12
Special Cases : Properties of Cumulative Correspondence Analysis






See



For I > 2 and in the case of EQUIPROBABLE categories the eigenvectors are given by
CHEBYCHEV POLYNOMIALS
For I > 2, and in the equiprobable case, the first component (location or linear ) is proportional the
Kruskal-Wallis statistic for contingency tables
Similarly the second component ( dispersion or quadratic ) is the generalizzation of the grouped
data version of Mood's (1954) statistic.
In general case this is no true
In the case of 2xJ table we have two components:
 the first component ( linear ) of Taguchi’statistics is equivalent to Wilcoxon statistics
 The second component (Quadratic ) is equivalent to Mood’s test (1954) (See Nair 1987)
Beh- D’Ambra- Simonetti in Communication in Statistics 2011
Coordinates
Distances
Properties of decomposition of Taguchi’s Statistic and Non Symmetrical Correspondence Analysis
(NSCA)
The Wilcoxon signed-rank test
 The Wilcoxon signed-rank test is a non-parametric
statistical hypothesis test used when comparing two
related samples or repeated measurements on a single
sample to assess whether their population mean ranks
differ (i.e. it's a paired difference test).
 It can be used as an alternative to the paired
Student's t-test when the population cannot be
assumed to be normally distributed or the data is
on the ordinal scale.
Distribution
If we consider the eigenvectors U of X
and eigenvalues we have :
 I
2
T    j  U ij 
j 1
 i 1

J 1
so
I
U
i 1
2
ij
j=…..J-1
: converge in distribution to independent central chi-squared RV’s with ( I-1) d.o.f
so the U’s are incorrelated and asymptotically distributed as i.i.d N(0,1) for
n tends to infinity
Relationship between the Points in a Cumulative Plot and a
Classical Plot

For their cumulative CA approach, Beh, D’Ambra and Simonetti (2011) derive the following
row and column profile coordinates to visualise the association between nominal row
categories and ordered
~
~
F  D r 1 / 2 AD ,
~
~
G  W 1 / 2 BD .
Suppose we consider for now the matrix of row profile coordinates, . These coordinates may be alternatively
expressed by (9)

~
~
F  Dr 1P  1r cT MTJ W1 2B



while for classical CA the matrix of row coordinates are defined by
F  Dr 1P  1r cT Dc1 2 B
Relationship between the Points in a Cumulative Plot and
a Classical Plot

As is matrix where , is not of full rank and , then we consider the Moore-Penrose generalised
inverse of which is equal to and we obtain
D
1
r
Therefore

P  1r cT  FB T D1c 2
~
~
F  FBT D1c 2MTJ W1 2B
This demonstrates that one may derive the cumulative row profile coordinates based on the coordinates obtained from
a classical CA and that the two sets of points are equivalent when
~
BT D1c 2 MTJ W1 2 B = I.
Relationship between the Points in a Cumulative Plot and
a Classical Plot

We may also derive the classical CA row profile coordinates given the cumulative coordinates. To do so, we
~ is of full rank and , ~ ~ T
distinguish two cases: and . If , B
BB  I then
.


~~
FBT  Dr 1P  1r cT MTJ W1 2

Post-multiplying by
M
T
J


W 1 2 which is the Moore-Penrose generalised inverse of
D
1
r


~~
P  1r cT  FB T M T J W1 2
M
T
J

W1 2 then we get


Substituting this into (9) we obtain the classical row profile coordinates from the row cumulative coordinates such that



~~
F  FB T M T J W1 2 D c1 2 B
Relationship between the Points in a Cumulative Plot and
a Classical Plot

~
If , I  J  1 B is not of full rank and
~
B which is equal to
same relationship.
~
BT
~~
BB T  I
, then we consider the Moore-Penrose generalised inverse of
and we obtain the classical coordinates from the cumulative coordinates from the
Example: Phadke’s data (1/12)

To illustrate the cumulative correspondence analysis using the
Taguchi’s statistic, D’Ambra, Köksoy, Simonetti (Journal of
Applied Statistics 2010) use Phadke’s data (1989).
The control factors (6) and their levels (3) of polysilicon deposition
process
Levels
A. Deposition temperature (oC)
B. Deposition pressure (mttor)
C. Nitrogen flow (sccm)
D. Silane flow (sccm)
E. Setting time (min)
F. Cleaning method
1
2
3
T025
P0200
N0
S0100
t0
None
T0
P0
N0 150
S0 50
t0+8
CM2
T0+25
P0+200
N0 75
S0
t0+16
CM3
Example: Phadke’s data (2/12)
Categories of product’s quality
Categories
I : 03 defects
II : 430 defects
III : 31300 defects
IV : 3011000 defects
V : 1001 and more defects
Description
No surface defect
Very few defects
Some defects
Many defects
Too many defects
Cumulative categories
(I) = I (03 defects)
(II) = I+II (030 defects)
(III) = I+II+III (0300 defects)
(IV) = I+II+III+IV (01000 defects)
(V) = I+II+III+IV+V (0 defects)
Example: Phadke’s data (3/12)
Factor effects for the categorized surface defect data
Probabilities for the
cumulative caregories
Number of observations
by categories
Factor Levels
(I)
(II)
(III)
(IV)
(V)
A1
A2
A3
34
7
8
40
22
14
51
34
19
53
41
32
B1
B2
B3
25
20
4
40
28
8
46
36
22
C1
C2
C3
19
11
19
30
20
26
D1
D2
D3
20
13
16
E1
E2
E3
F1
F2
F3
(I)
(II)
(III)
(IV)
(V)
54
54
54
0.63 0.74
0.13 0.41
0.15 0.26
0.94
0.63
0.35
0.98 1.00
0.76 1.00
0.59 1.00
51
43
32
54
54
54
0.46 0.74
0.37 0.52
0.07 0.15
0.85
0.67
0.41
0.94 1.00
0.80 1.00
0.59 1.00
32
28
44
39
39
48
54
54
54
0.35 0.56
0.20 0.37
0.35 0.48
0.59
0.52
0.81
0.72 1.00
0.72 1.00
0.89 1.00
25
31
20
34
42
28
41
44
41
54
54
54
0.37 0.46
0.24 0.57
0.30 0.37
0.63
0.78
0.52
0.76 1.00
0.81 1.00
0.76 1.00
21
16
12
27
29
20
38
36
30
43
42
41
54
54
54
0.39 0.50
0.30 0.54
0.22 0.37
0.70
0.67
0.56
0.80 1.00
0.78 1.00
0.76 1.00
21
21
7
23
30
23
26
40
38
34
46
46
54
54
54
0.39 0.43
0.39 0.56
0.13 0.43
0.48
0.74
0.70
0.63 1.00
0.85 1.00
0.85 1.00
Example: Phadke’s data (4/12)
 The Taguchi’s statistic T=318,5669
 The Pearson chi-squared statistic for the four
contingency tables obtained by aggregating column
categories 1 to j, and aggregating the column
categories j+1 to J.
I
II+III+IV+V
83,209
I+II
III+IV+V
79,265
I+II+III IV+V
95,8786
I+II+III+IV V
60,2143
The partition of Taguchi’s statistic from contingency table in
Pearson chi-squared statistics
Aggregated Column Categories
Factor
A1
A2
A3
B1
B2
B3
C1
C2
C3
D1
D2
D3
E1
E2
E3
F1
F2
F3
(I)
34
7
8
25
20
4
19
11
19
20
13
16
21
16
12
21
21
7
(II+III+IV+V)
20
47
46
29
34
50
35
43
35
34
41
38
33
38
42
33
33
47
12  83,2
(I+II)
40
22
14
40
28
8
30
20
26
25
31
20
27
29
20
23
30
23
(III+IV+V)
14
32
40
14
26
46
24
34
28
29
23
34
27
25
34
0
0
0
 22  79,3
(I+II+III)
51
34
19
46
36
22
32
28
44
34
42
28
38
36
30
26
40
38
32  95,9
(IV+V)
3
20
35
8
18
32
22
26
10
20
12
26
16
18
24
0
0
0
(I+II+III+IV)
53
41
32
51
43
32
39
39
48
41
44
41
43
42
41
34
46
46
 42  60,2
(V)
1
13
22
3
11
22
15
15
6
13
10
13
11
12
13
20
8
8
Total
2
TOT
 318,6
Example: Phadke’s data (5/12)
The table shows the Accumulation Analysis
(AA) , this is an ANOVA-like procedure
(*).
Following Taguchi (see Nair Technometrics 1987 Testing in
Industrial experiments with ordered categorical data )
Source
F( *)
A
14,3
B
10,8
C
2,3
F(A )= MSA/MSe
D
1,8
with (I-1)(J-1) dof
and I(n-1)(J-1) dof
E
1,2
F
2,9
A and B are the two most important
factors affecting product’s quality
(*) This approach does not have ANOVA’s property
of independent sums of squares.
Example: Phadke’s data (6/12)
Figure shows the
graphical representation
of the results.
Table shows the
distances from the origin
to the column points in
Figure above.
I (II+III+IV+V)
I+II (III+IV+V)
I+II+III (IV+V)
I+II+III+IV (V)
30,618
4,995
1,658
0,023
We note that the point “I
vs (II+III+IV+V)” is the
most important because
it represents a larger
contribution (30,618),
which is measured by
the distances from the
origin.
Example: Phadke’s data (7/12)
Figure shows the
row and column
categories of
Singly ordered
cumulative
correspondence
analysis, the
supplementary
points of the
factors
(A,B,C,D,E,F)
and the column
categories of
classical analysis
Row cordinates of cumulative ordinal correspondence analysis
Categories “I” from correspondence analysis
Colomn cordinates of cumulative ordinal correspondence analysis
Categories “II” from correspondence analysis
Colomn cordinates of cumulative ordinal correspondence analysis
Supplementary point of factors: A, B, C, D, E, F
Categories “III” from correspondence analysis
Categories “IV” from correspondence analysis
Categories “V” from correspondence analysis
A,B are important
factors
Example: Phadke’s data (8/12)
Level
A
B
C
D
E
F
1
5,197
2,262
3,353
3,315
1,963
5,749
2
5,536
2,882
6,519
1,715
2,836
0,787
3
10,233
11,153
0,700
5,542
5,772
4,035
Level
A
B
C
D
E
F
1
5,345
4,143
5,777
5,328
4,854
7,438
2
2,192
4,908
4,088
2,476
4,139
4,228
3
5,259
3,745
2,931
3,538
3,803
1,056
Two tables show the
distances between
the row points and
“I vs (II+III+IV+V)”
column point on the
first and second
factorial axes.
Example: Phadke’s data (9/12)
“I vs (II+III+IV+V)”
1° Axis
2° Axis
A1
A2
B1
B3
C3
C3
D2
D2
E1
E3
F2
F3
Table shows the first and
second axes solutions based on
the minimum distance reports.
So we choose this optimal
combination
Example: Phadke’s data (10/12)
Comparative results for the optimal factor settings
Probabilities for the
cumulative categories
Method
Solution
(I)
(II)
(III)
(IV)
(V)
MEL
SCORE
WSNR
AA
MSD
STARTING
A1B1C3D2E1F2
A1B1C3D2E2F2
A1B1C1D1E2F2
A1B2C1D3E2F2
A1B1C3D2E1F3
A2B2C1D3E1F1
0.875
0.822
0.896
0.814
0.617
0.363
0.959
0.964
0.960
0.863
0.931
0.435
0.998
0.998
0.986
0.942
0.998
0.394
0.999
0.999
0.996
0.983
0.999
0.554
1.000
1.000
1.000
1.000
1.000
1.000
PROPOSED:
1° Axis
2° Axis
Plane solution
A1B1C3D2E1F2
A2B3C3D2E3F3
A2B1C3D2E2F3
0.875
0.090
0.090
0.959
0.590
0.800
0.998
0.880
0.975
0.999
0.820
0.984
1.000
1.000
1.000
MEL=Asiabar and Ghomi (2006), SCORE=Nair(1986), WSNR=Wu and Yeh (2006),
AA=Phadke’s accumulation Analysis (1989), MSD=Jeng and Guo (1996), STARTING= Starting,
PROPOSED=D’Ambra, Köksoy and Simonetti
Example: Phadke’s data (11/12)
The last table shows the comparative results for the solution methods to optimize factor
settings according to their predicted probabilities for the cumulative categories.
To calculate the optimal probabilities for the cumulative categories Taguchi uses the
omega transform, also known as the logit transform. The omega transform for
probability p is defined by
 p 

w(p)  10log10 
1 p 
The optimum settings recommended by the first factorial axes solution of cumulative
correspondence analysis is A1, B1, C3, D2, E1, F2.
By the inverse omega transform, the predicted probability for category (I) is 0.875.
( The procedure to compute the probability is in D’Ambra et al J.of Applied Statistics 2009)
The second axis solution (i.e., A2, B3, C3, D2, E3, F3) does not seem so powerful and the
probabilities for the cumulative categories are not high enough.
The plane solution (i.e., A2, B1, C3, D2, E2, F3) especially provides a very low probability in
category (I).
As a result, we suggest to pick the first axes solution as the optimal solution for the Phadke’s
polysilicon deposition process.
Example: Phadke’s data (12/12)
We observe that the first axis solution is equivalent to the MEL (i.e., minimization of
expected loss) solution proposed by Asiabar and Ghomi (2006). The solution seems
a nice candidate among the others since the probabilities for the cumulative
categories are higher.
Asiabar and Ghomi (2006) suggested a technique, which is called MEL that minimizes
the expected loss for the analysis of ordered categorical data. After an experiment
and data collection authors define a probability distribution function of data in
categories. In the final step of MEL algorithm, expected loss in each level of factors is
calculated and the decision is made by the fact that the optimum level of a factor is
the one where the expected loss is lower than the expected loss at other levels of that
factor.
Doubly ordered cumulative correspondence analysis. (submit
Communication in statistics )
 Now , we explore a generalization of
Taguchi’s statistic which takes into account
the presence of both ordered variables by
considering the cumulative sum of cell
frequencies across the variables.
Approach of Cuadras (1/2)
Cuadras (2002) proposed the following approach based
on double cumulative frequencies
D LP  rc M W  USV
1 2
r






T
T
12
J
U is the matrix containing the left singular vectors
S is the diagonal matrix containing the singular values
V is the matrix containing the right singular vectors.
WJ is the J x J diagonal matrix of weights 1/J
L is lower triangular matrix
M is upper triangular matrix
T
Approach of Cuadras (2/2)
 Disadvantages:
This approach does not decompose any known
index.
This approach has not the property to be the
sum of the Pearson chi-squared statistic for the
contingency table obtained by partitioning and
pooling the original data.( see Taguchi)
Doubly Cumulative Correspondence
Analysis (1/4)
 Starting from the proposal of Beh et al. (2007 -2011), we present
a more general approach based on double cumulative
frequencies which overcomes these problems and presents
some interesting proprieties.
 Notation
 R the 2(I-1)xI matrix obtained by alternating the rows of an (I-1)xI
lower triangular matrix of ones without the row of all ones and
the rows of an (I-1)xI upper triangular matrix of ones without the
row of all ones.
 C the Jx2(J-1) matrix obtained by alternating the columns of an
Jx(J-1) upper triangular matrix of ones without the column of all
ones and the columns of an Jx(J-1) lower triangular matrix of
ones without the column of all ones.
 DR and DC the diagonal matrices with the marginal frequencies of
doubly cumulative table .
Doubly Cumulative Correspondence
Analysis (2/4)
 The CA can be approached by using
cumulative frequencies for rows and columns
1 2
R
D


1 2
C
R P  rc CD
T
 USV
T
 The row and column coordinates are
respectively
G r  DR1RP  rc T CDC1V, Gc  DC1CT P  rc T  RT DR1U
T
Doubly Cumulative Correspondence
Analysis (3/4)
 The inertia

Q  nI  1 J  1trace D
1 2
R


1
C

R P  rc CD C P  rc
T
T
R
T T
T

D R1 2  nI  1 J  1 s k2
k
can be considered a generalization of Taguchi’s statistic because
takes into account the presence of both ordinal variables.
 It is easy to verify that trace of Q is identical to doubly
2
cumulative chi-squared statistic defined by Hirotsu (1986)
( used for comparing treatments and change point analysis)
This approach preserves same property of Taguchi’s statistics

I 1 J 1
   ij2
i
j
 ij2 is the Pearson chi-squared statistic for the 2x2 contingency table
obtained by partitioning and pooling the original table
Null distribution
 Hirotsu (1994) showed that the null
distribution of the statistic  is approximated
by d with d  d d and v  I 1J 1 d d  where
2
2
v
d1  1 
d2  1
1 2
1
2
     J 2 
2  1 1  2
 

 1
J  1  2
3
 J 1

     I 2 
2  1 1   2
 

 1
I 1  2
3
 I 1

 j  n1    n j  n., j 1    nJ 
 i  n1   ni  ni1,   nI  
Doubly Cumulative Correspondence
Analysis (4/4)
 The CA on the on doubly cumulative table, which we call Doubly
Cumulative Correspondence Analysis, presents the following
properties
 The approach maximizes the fi-squared statistic of each 2 by 2
table and, apart the constant, 2(I-1)x2(J-1), of doubly cumulative
table.
 All the weighted row and the column coordinates are centred
 The weighted row and the column coordinates are centred for the
2 by 2 tables
 This approach allows of representing the variations of row and
column categories rather than the categories on the space
generated by cumulative frequencies. Successively, it is possible
to project on the same space the row and column categories as
supplementary points.
An unified approach
In order to represent the rows and columns of N we can consider the following SVD depending
on four matrices F, B, D, E and the vector a
FB P  acT DT E  USV T
Overall approach
a r,
E  Dc1 Correspondence Analysis
B  I,
1. F  Dr 1 2 ,
DT  I ,
a r,
E  WJ Cuadras approach
B  I,
2. F  Dr 1 2 ,
DT  MT ,
a  r ,. B  I ,
DT  MT J ,
3. F  Dr 1 2 ,
E  W Taguchi decomposition (Beh,
D’Ambra, Simonetti, 2007 -2011)
a r,
E  WJ Doubly Cumulative
B  L , DT  MT ,
4. F  Dr 1 2 ,
Correspondence Analysis (Cuadras approach)
E  DC1 Doubly Cumulative
a r,
B  R , DT  C ,
5. F  DR1 2 ,
Correspondence Analysis (our approach – Hirotsu decomposition)
E  DC Non Symmetrical
a  1r , B  I ,
6. F  Dr 1 2 ,
DT  I ,
Correspondence Analysis
Monitoring Automobile Pollution Emissions
 The data reflects 2141 nox emission/acceleration measurements for
one driving cycle and one car on urban road as collected by the Italian
Research Council (NCR). (Nox Ossido di nitrogeno)
Acc1
Acc2
Acc3
Acc4
Acc5
Acc6
Acc7
Total
Nox1
91
135
146
229
120
129
32
882
Nox2
61
114
102
179
111
106
46
719
Nox3
13
31
41
78
99
119
13
394
Nox4
3
4
6
12
18
45
7
95
Nox5
2
3
3
5
10
20
8
51
Total
170
287
298
503
358
419
106
2141
Monitoring Automobile Pollution Emissions
 Simple correspondence plot of NCR
Symmetric plot
(axes F1 and F2: 95,46 %)
0.6
Nox5
Acc7
0.4
0.2
Nox4
F2 (11,81 %)
Nox2
Acc1 Acc2
0
Acc6
Nox1
Acc3 Acc4
Acc5
Nox3
-0.2
-0.4
-0.6
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
F1 (83,66 %)
Columns
Rows
0.4
0.6
0.8
1
1.2
Monitoring Automobile Pollution Emissions
 There is no real apparent difference between Acc1 and Acc2. Nor is
there any great difference between Acc3 and Acc4. In fact these two
small clusters appear rather similar as well which can be seen based on
their close proximity to each other in the plot. Acc5, Acc6 and Acc7 all
appear to be different from each other and the categories reflecting low
values of Acc.
 Similarly, Nox1 and Nox 2 appear to be similarly distributed. They are
also quite different to Nox 3, Nox4 and Nox5 which are quite different.
 Figure also suggests that the association between the Nox and Acc
variables is such that low levels of acceleration (up to, and including,
Acc4) are associated with the two lowest levels of nitrogen oxide
emissions. Certainly accelerations exceeding no more than 0.2m/s2 are
associated with log(NOx) emissions not exceeding -0.225g/s. Similarly
relatively heavy acceleration, exceeding 0.2m/s2, is associated with
high nitrogen oxide emissions, although this association is not as strong
as the link between Acc and Nox at the lower levels.
 The quality of the display is very good and represents 95.46% of the
association that exists between the row and column variables.
Monitoring Automobile Pollution Emissions
 While the simple correspondence analysis provides
some revealing insight into the link between car
acceleration and pollution emission, it does not reflect
the ordered structure of the Acc and Nox categories.
 Therefore one may take into consideration this structure
by considering the doubly-ordered cumulative
correspondence analysis technique.
Monitoring Automobile Pollution Emissions
Calculating the doubly cumulative table
1

0
1

0
R 
1
0

1
0

0
1
1
0
1
0
1
0
0
1
0
1
1
0
1
0
0
1
0
1
0
1
1
0
RxNxC=
0

1
0

1
0 
1

0
1 
1

0
0

C  0
0

0

0
91 135 146 229 120 129 32
61 114 102 179 111 106 46
N= 13 31 41 78 99 119 13
3
4
6 12 18 45 7
2
3
3
5 10 20 8
0 1 0 1 0 1 0 1 0 1 0

1 1 0 1 0 1 0 1 0 1 0
1 0 1 1 0 1 0 1 0 1 0

1 0 1 0 1 1 0 1 0 1 0
1 0 1 0 1 0 1 1 0 1 0 
1 0 1 0 1 0 1 0 1 1 0

1 0 1 0 1 0 1 0 1 0 1
Acc1
Acc2-7
Acc1-2
Acc3-7
Acc1-3
Acc4-7
Acc1-4
Acc5-7
Acc1-5
Acc6-7
Acc1-6
Acc7
Nox1
91
791
226
656
372
510
601
281
721
161
850
32
Nox2-5
79
1180
231
1028
383
876
657
602
895
364
1185
74
Nox1-2
152
1449
401
1200
649
952
1057
544
1288
313
1523
78
Nox3-5
18
522
56
484
106
434
201
339
328
212
512
28
Nox1-3
165
1830
445
1550
734
1261
1220
775
1550
445
1904
91
Nox4-5
5
141
12
134
21
125
38
108
66
80
131
15
Nox1-4
168
1922
452
1638
747
1343
1245
845
1593
497
1992
98
Nox5
2
49
5
46
8
43
13
38
23
28
43
8
Monitoring Automobile Pollution Emissions

 The doubly cumulative chi-squared
statistic defined by Hirotsu
2
 807.816
4
3
i
j
 2    ij2
 It is easy to verify
Pearson chi-squared statistic for the 2 by 2 tables
Acc1
Acc2-7
Acc1-2
Acc3-7
Acc1-3
Acc4-7
Acc1-4
Acc5-7
Acc1-5
Acc6-7
Acc1-6
Acc7
Nox1
11.596
16.353
31.399
54.492
31.831
5.577
20.967
51.807
77.321
138.202
84.748
0.084
4.371
16.079
29.924
69.265
77.585
9.434
1.154
4.145
8.772
23.860
26.054
12.795
Nox2-5
Nox1-2
Nox3-5
Nox1-3
Nox4-5
Nox1-4
Nox5
maxij2  138.202
Monitoring Automobile Pollution Emissions
 Eigenvalues and percentages of inertia of doubly cumulative
correspondence analysis
Principal Axis
Principal Inertia
% Cont. to X2/n
Cumulative %
1
0.015290426
97.26
97.26
2
0.000374297
2.38
99.64
3
0.000054826
0.35
99.99
4
0.000001907
0.01
100
Total
0.015721156
100
 It is easy to verify that, apart the constant,
nI 1J 1  2141 4  6  5138
 The total inertia is identical to doubly cumulative chi-squared
statistic defined by Hirotsu
nI  1J  1 sk2   2
k
5138 0.01572 807.816
Monitoring Automobile Pollution Emissions

By considering the first singular value, we obtain the approximations of the
of the 2 by 2 contingency tables.
Acc1
Acc2-7
Acc1-2
Acc3-7
Acc1-3
Acc4-7
Acc1-4
Acc5-7
Acc1-5
X ij**2 (1)
Acc6-7
Acc1-6
Acc7
Nox1
6.816
16.480
27.806
54.339
41.184
2.809
16.679
40.330
68.047
132.976
100.784
6.875
9.199
22.243
37.529
73.339
55.585
3.791
3.141
7.595
12.815
25.043
18.980
1.285
Nox2-5
Nox1-2
Nox3-5
Nox1-3
Nox4-5
Nox1-4
Nox5
max ij2 (1)  132.976

The 2x2 contingency table that is formed by creating two dichotomous variables by considering
the two sets of accumulated row categories (Nox1, Nox2) and (Nox3, Nox4, Nox5) and the two
sets of accumulated column categories (Acc1, Acc2, Acc3, Acc4) and (Acc5, Acc6, Acc7)
displays the greatest variation along the first axis, and accounts for 17% of its total inertia.
(132.976/.807.816)x100 = 17%
 This shows that, the greatest discrimination between the ordered row categories is between
Nox2 and Nox3 while the greatest discrimination between the ordered column categories is
between Acc4 and Acc5.
Monitoring Automobile Pollution Emissions
 Row coordinates and their squared Euclidean distance
from the origin
Row
Profile Coordinates
Distan. origin)
Category
Axis 1
Axis 2
Axis 3
Axis 4
Axes 1 & 2
Nox1
-0.1289
0.0003
-0.0141
0.0013
0.017
Nox2-5
0.0903
-0.0002
0.0099
-0.0009
0.008
Nox1-2
-0.0980
0.0139
0.0017
-0.0005
0.010
Nox3-5
0.2905
-0.0411
-0.0052
0.0015
0.086
Nox1-3
-0.0339
-0.0049
0.0021
0.0003
0.001
Nox4-5
0.4632
0.0665
-0.0290
-0.0046
0.219
Nox1-4
-0.0114
-0.0038
-0.0005
-0.0003
0.000
Nox5
0.4687
0.1573
0.0189
0.0111
0.244
the biggest variations
distance between the
row categories Nox3-5
and Nox4-5
Monitoring Automobile Pollution Emissions
 Column coordinates and their squared Euclidean
distance from the origin
Column
Profile Coordinates
Dista. (origin)
Category
Axis 1
Axis 2
Axis 3
Axis 4
Axes 1 & 2
Acc1
-0.2203
0.0439
-0.0329
0.0064
0.0504
Acc2-7
0.0190
-0.0038
0.0028
-0.0006
0.0004
Acc1-2
-0.1931
0.0271
-0.0010
-0.0031
0.0380
Acc3-7
0.0524
-0.0074
0.0003
0.0008
0.0028
Acc1-3
-0.1770
0.0152
-0.0055
0.0007
0.0316
Acc4-7
0.0964
-0.0083
0.0030
-0.0004
0.0094
Acc1-4
-0.1530
0.0030
-0.0006
-0.0010
0.0234
Acc5-7
0.2180
-0.0042
0.0009
0.0014
0.0475
Acc1-5
-0.0906
-0.0099
0.0063
0.0007
0.0083
Acc6-7
0.2790
0.0308
-0.0193
-0.0022
0.0788
Acc1-6
-0.0095
-0.0086
-0.0023
-0.0001
0.0002
Acc7
0.1820
0.1660
0.0431
0.0021
0.0607
The biggest variations
distance between the
column categories is:
Acc1-4 and Acc1-5
Note that Acc1, Acc1-2,
Acc1-3 and Acc1-4 lie
close to one another in
the plot indicating that
there is very little
difference in the relative
distribution of the cars
for the first four Acc
levels.
Interpretation
 To help with the interpretation of plot, the squared Euclidean
distance of each of the row and column categories from the
origin of the plot is summarised in the last column of Table 7 and
8, respectively.
 These distances show that the biggest variations exist between
the row categories Nox2-5 and Nox3-5 and the column
categories Acc1-4 and Acc1-5.
if we consider the column variable, the greatest discrimination
between its categories along the first axis exists between the
adjacent categories Acc4 and Acc5; this can be visually verified
by considering the distance of the points between Acc1-4 and
Acc5-6. Such conclusions are consistent with those described
above.
Monitoring Automobile Pollution Emissions
Plot of doubly ordered cumulative correspondence analysis
Symmetric plot
(axes F1 and F2: 99,64 %)
0.3
0.2
Acc7
Nox5
0.1
F2 (2,38 %)
Nox4-5
Acc6-7
Acc1
Acc1-2
Acc1-3
0
Nox1-2
Nox1
Nox2-5
Nox1-3 Nox1-4
Acc1-4
Acc1-5
Acc1-6 Acc2-7 Acc3-7
Acc4-7
Acc5-7
Nox3-5
-0.1
-0.2
-0.4
-0.3
-0.2
-0.1
0
0.1
F1 (97,26 %)
Columns
Rows
0.2
0.3
0.4
0.5
Monitoring Automobile Pollution Emissions
 Acc1, Acc1-2, Acc1-3 and Acc1-4 lie close to one another in the plot
indicating that there is very little difference in the relative distribution of the
cars for the first four Acc levels. That is, the four lowest acceleration levels
play an equivalent, and similar, role in the association structure between
the rows and columns; this is consistent with the conclusions reached from
CA plot. The relatively similarity of Acc2, Acc3 and Acc4 are also evident
from the close proximity of Acc2-7, Acc3-7 and Acc4-7 on the right hand
side of the plot.
 Figure 2 also shows that Acc5 and Acc6 are quite different from one
another as are Acc4 and Acc5 due to the relatively large distance between
Acc1-5, Acc1-6 which lies on the left hand side of the plot. By considering
the proximity of the points Acc4-7, Acc5-7, Acc6-7 and Acc7 from one
another we can see that there appears to be quite a different relative
distribution of cars for the Acc levels of 4, 5, 6 and 7.
Monitoring Automobile Pollution Emissions:
the key difference when interpreting the configuration of points of
two plot
 The configuration of points obtained by performing a classical correspondence
analysis and one using doubly ordered cumulative correspondence analysis
points to similar graphical conclusions regarding the association between the
Acc and Nox variables.
 However, the classical correspondence plot only highlights that low levels of Acc
and low levels of Nox are strongly associated, while the association is less
obvious for higher levels of Acc and Nox.
 While doubly ordered cumulative correspondence plot also reflects this
association structure, it does so by taking into consideration the ordered
structure of each categorical variable.
 Doubly ordered cumulative correspondence also provides a clear discrimination
of the association between cumulative subsets of categories thereby identifying
those subsets of categories with the most significant association, and those with
no statistically significant association. That is, it was able to detect that there is a
large difference between Acc4 and Acc5, and Nox 2 and Nox3, however, by
performing a cumulative correspondence analysis it was revealed that there
difference reflected a statistically significant difference between the categories;
this can be seen by considering the chi-squared statistics.
Example: Van Rijckevorsel’s data (1/8)

A data matrix that is both RR (row regression dependence ) and
CR (row regression dependence) following Schriever 1983, Warren-Heiser 2009
The appreciations of five red Bordeaux wines by 200 judges using a four category
system: from excellent to boring (Van Rijckevorsel, 1987, p. 60)
R1
R2
R3
R4
R5
grand cru classè
cru Bourgeois
Bordeaux d'Origine
vin de marque
vin de table
C1
excellent
87
45
36
0
0
168
C2
good
93
126
68
30
0
317
C3
mediocre
19
24
74
111
52
280
C4
boring
1
5
22
59
148
235
200
200
200
200
200
1000
 The rows and columns of Table have been permuted using the scores of the first
CA dimension.
 Since Table is both RR and CR, there exists a strong ordinal association
between the categories of two variables and the five wines can be perfectly
ordered from excellent to boring. Then, we use such table for illustrating the
doubly cumulative correspondence analysis
Example: Van Rijckevorsel’s data (2/8)
Calculating the doubly cumulative table
R=
RxNxC=
1
0
1
0
1
0
1
0
0
1
1
0
1
0
1
0
0
1
0
1
1
0
1
0
0
1
0
1
0
1
1
0
0
1
0
1
0
1
0
1
87
45
N= 36
0
0
93 19 1
126 24 5
68 74 22
30 111 59
0
52 148
C=
1
0
0
0
0
1
1
1
1
1
0
0
0
0
1
1
1
1
1
0
0
0
0
1
Example: Van Rijckevorsel’s data (3/8)
 The doubly cumulative chi-squared
statistic defined by Hirotsu

 It is easy to verify
 2    ij2
2
 2609,089
4
3
i
j
Pearson chi-squared statistic for the 2 by 2 tables
 ij2
C1
R1
R2-R5
R1-R2
R3-R5
R1-R3
R4-R5
R1-R4
R5
C2-C4
C1-C2
C3-C4
C1-C3
C4
127,505795
172,3801421
73,56417744
125,1717033
411,1867347
179,4836138
134,6153846
448,6704701
295,9486395
50,48076923
235,4368932
354,6446948
max
 ij2
min
 ij2
Example: Van Rijckevorsel’s data (4/8)
Eigenvalues and percentages of inertia of doubly cumulative
correspondence analysis
Eigenvalue
Cumulative %
F1
0,213
97,749
F2
0,004
99,764
F3
0,001
100,000
Total inertia
0,217
 It is easy to verify that, apart the constant,
nI  1J  1  1000 4  3  12000
The total inertia is identical to doubly cumulative chisquared statistic defined by Hirotsu
nI  1J  1 sk2   2
k
1200 0,217 2609,089
Example: Van Rijckevorsel’s data (5/8)
Plot of Doubly ordered cumulative C.A.
Max Dist from origin is c1-c2 c3-c4 r1-r3 r4-r5
Max Chi-squared 448,67 . We note different variations
c1, c1-c2 c1 c3 From r1 r1-r2 r1-r3
Diff. variations R5 R4-R5 R3-R5 from C4 C3-C4 C2-C4
Plot of correspondence analysis
1.5
Correspondence Plot
(Profile Coordinates)
C4R5
C1 R1
C1-C2 R1-R2
R4-R5
R2-R5
0
R1-R3C1-C3 R1-R4
R5
C4
C1
R1
R2
C2
R3
-0.5
F2 (2,01 %)
0,5
0.0
Principal Axis 2 ( 15.05 %)
1
0.5
1.0
Symmetric plot
(axes F1 and F2: 99,76 % )
C3 R4
-1.0
C2-C4 R3-R5 C3-C4
-0,5
-1
-0,5
0,5
0
F1 (97,75 % )
Columns
Rows
1
1,5
-1.0
-0.5
0.0
0.5
Principal Axis 1 ( 80.7 %)
TOTAL 2D ASSOC. - 95.75 %
1.0
1.5
Example: Van Rijckevorsel’s data (6/8)










Lets look first at the position of C1 and R1. Since they are situated near each other in this plot, this suggests that this row
category and column category are associated with each other. So if we were to look at their position in the classical plot
they would be located near each other.
Looking at the position of C1 and C1-C2: These two points are situated fairly close to one another indicating that there is a
small difference between C1 and C2. Since C1-C2 is slightly closer to the origin than C1 this suggests that C2 is also
slightly closer to the origin (in the classical CA plot) than C1.
Similar comments can be made by considering the relatively short distance between R1 and R1-R2. Such a distance
implies that, in the classical CA plot R1 and R2 are located near each other.
If we consider the relative distance between (C1, C1-C2) and (R1, R1-R2) we can see that these two distances are similar.
Since we have discussed that C1 is associated with R1, these similar distances imply that C2 and R2 are also similarly
positioned in the classical CA plot.
The relatively similar distance between R1, R1-R2 and R1-R3 suggests that the relative distance between R1, R2 and R3
in the classical CA plot are the same.
Lets look at the right hand side of our cumulative plot. C4 and R5 are situated close to each other implying that in the
classical CA plot they will also be situated close to one another.
The distance between R5 and R4-R5 tells me that R4 is quite different to R5. Since R4-R5 is situated closer to the origin
than R5 then R4 will be situated closer to the origin that R5.
The relative equal distance between R2-R5 (closer to the origin), R3-R5 and R4-R5 (further from the origin) tells me that
R2, R3 and R4 are roughly the same distance apart from each other in the classical CA plot.
What is interesting is that the distance between the pairs (R1, R2-R5), (R1-R2, R3-R5), (R1-R3, R4-R5) and (R1-R4, R5)
are about the same indicating that the cumulative nature of our analysis is preserving the relative difference (or similarity) of
R1, R2, R3, R4 and R5 that the classical CA plot would reflect.
All of these conclusions regarding the interpretation of the cumulative correspondence plot is reflected in the classical CA
plot.
Applicazione
 Cumulative Correspondence Analysis as a
tool for optimize factor setting in public
transport
Cumulative Correspondence Analysis as a
tool for optimize factor setting in public transport
Obiettivo
Organizzare un servizio di trasporto pubblico, definendo i livelli
dei fattori che lo compongono (Scenario economicamente
ottimale), attraverso l’utilizzo dell’Analisi delle Corrispondenze
Cumulate (D’Ambra et al, 2009)
Parole chiave
Servizio di trasporto pubblico, Scenario economicamente
ottimale, Analisi delle Corrispondenze Cumulate, Piano degli
esperimenti, Indice di Taguchi
Cumulative Correspondence Analysis as a tool for optimize factor setting in public transport
Cumulative Correspondence Analysis as a
tool for optimize factor setting in public transport
Le Fasi della Ricerca
Presentazione della Metodologia Statistica
Scomposizione del Servizio di Trasporto:
Definizione dei Fattori e dei Livelli
Individuazione del Piano Sperimentale Ridotto
(Taguchi)
Costruzione e Somministrazione della Scheda di
Rilevazione
Elaborazione corrispondenze cumulata
Aspetti decisionali
Cumulative Correspondence Analysis as a tool for optimize factor setting in public transport
Cumulative Correspondence Analysis as a
tool for optimize factor setting in public transport
Caso Studio
Servizio: Trasporto passeggeri su ferro – Tratta Napoli Roma
Fattori che influenzano la valutazione del servizio: Costo, Frequenza, Confort e Durata
Viaggio
Livelli: 3 (Basso - Medio - Alto)
Piano Sperimentale prescelto L9
Scala per la valutazione del servizio: Likert a 5 punti
Durata
Comfort
Costo
Frequenza
Valutazione
Alto
Basso
Basso
Basso
Da 1 a 5
Alto
Medio
Medio
Medio
Da 1 a 5
Alto
Alto
Alto
Alto
Da 1 a 5
Medio
Basso
Medio
Alta
Da 1 a 5
Medio
Medio
Alto
Bassa
Da 1 a 5
Medio
Alto
Basso
Media
Da 1 a 5
Alto
Basso
Alto
Media
Da 1 a 5
Alto
Medio
Basso
Alta
Da 1 a 5
Alto
Alto
Medio
Bassa
Da 1 a 5
Costruzione della Tabella
per l’Analisi delle
Corrispondenze Cumulate
Piano fattoriale ridotto L9 di Taguchi
Cumulative Correspondence Analysis as a tool for optimize factor setting in public transport
Cumulative Correspondence Analysis as a
tool for optimize factor setting in public transport
Costruzione della Tabella per l’Analisi delle Corrispondenze Cumulate
Calcolo del Chi-quadrato per
tutte le 4 (K-1) tabelle 12x2 (Ix2)


2
1vs2345
2
12 vs345
 93,66
 135,49
2
123
vs45  171,51

2
1234 vs5
 221,31
T  621,97
Cumulative Correspondence Analysis as a tool for optimize factor setting in public transport
0,3
C1
II
-0,3
III
A2
B2
D3
D2
A3
B1
IV
C3
Per ogni livello di valutazione
complessiva è stato costruito uno
scenario del servizio erogato
B3
D1
0,3
A1
I
C2
V
Valutazione V
I
II
III
IV
A3 – Costo basso
A2
A1
alto
medio
B1 – Frequenza alta
B2
B3
bassa
media
C1 – Comfort alto
C3
basso
D3 – Tempo di percorrenza basso
D2
D1
alto
medio
-0,3
Cumulative Correspondence Analysis as a tool for optimize factor setting in public transport
In sintesi:
Lo scenario relativo al livello di Valutazione Molto Basso non è differente dallo
scenario Basso
Il passaggio dalla valutazione del servizio da Basso a Medio può avvenire se sono
migliorate contemporaneamente i fattori Costo – Frequenza – Durata
Il cliente per passare da una valutazione media ad una Alta richiede un netto
miglioramento del fattore Confort
Cumulative Correspondence Analysis as a tool for optimize factor setting in public transport
0,3
Per verificare la significatività
delle modalità sono stati calcolati
le regioni di confidenza, con la
procedura Bootstrap
Bootstrap
C1
A2
D3
D2
-0,3
B1 A3
Risultati
B2
C3 D1 B3
C2
0,3
A1
-I livelli medi di tutti i fattori sono
statisticamente non significativi
per predire il livello di
soddisfazione complessiva
-Sono significativi i livelli basso ed
alto di ciascun fattore per predire
il livello di soddisfazione
complessiva
-0,3
Cumulative Correspondence Analysis as a tool for optimize factor setting in public transport
Cumulative Correspondence Analysis as a
tool for optimize factor setting in public transport
Principale Bibliografia
Beh, E. J. , D'Ambra, L., Simonetti B. (2007). Cumulative correspondence analysis for ordered
categorial data using Taguchi's Statistic, in review.
D’Ambra L., Koksoy O., Simonetti B. (2008). Cumulative correspondence analysis of ordered
categorical data from industrial experiments, Journal of applied statistics
Hirotsu, C. (1986). Cumulative chi-squared as a tool of goodness of fit, Biometrika 73, pp.165173.
Nair, V.N. (1986). Testing in industrial experiments with ordered categorical data, Technometrics,
28(4), 283-291.
Taguchi, G. (1966), Statistical Analysis (in Japanese), Tokyo: Maruzen.
Taguchi, G. (1974), A new statistical analysis for clinical data, the accumulating analysis, in
contrast with the chi-square test, Saishin Igaku, 29, 806-813.
Taguchi, G. (1991a). Taguchi methods: Case studies from the U.S. and Europe, Michigan:
American Supplier Institute.
Taguchi, G. (1991b). Taguchi methods: Research and development, Michigan: American
Supplier Institute.
Taguchi, G. (1991c). Taguchi methods: Signal-to-noise ratio for quality evaluation, Michigan:
American Supplier Institute.
Cumulative Correspondence Analysis as a tool for optimize factor setting in public transport
Cumulative Correspondence Analysis as a
tool for optimize factor setting in public transport
Cumulative Correspondence Analysis as a tool for optimize factor setting in public transport
Cumulative Correspondence Analysis as a
tool for optimize factor setting in public transport
Polinomi ortogonali di Emerson:
È una famiglia particolare di polinomi ortogonali, applicabili a tabelle di
contingenza proposta da Emerson nel 1968.
Questi polinomi applicati alle tabelle di contingenza in presenza di variabili
ordinali decompongono l’inerzia totale (l’indice chi-quadrato nel caso simmetrico
o il numeratore del tau di Goodman-Kruskal nel caso non simmetrico) in
componenti di grado diverso indipendenti e identicamente distribuite la cui
somma corrisponde alla variabilità totale.
Cumulative Correspondence Analysis as a tool for optimize factor setting in public transport
Cumulative Correspondence Analysis as a
tool for optimize factor setting in public transport
Polinomi ortogonali di Emerson:
Il calcolo dei polinomi ortogonali può essere facilmente ottenuto utilizzando la
formula di ricorrenza, utilizzando un sistema di punteggi equidi stanziati (natural
scores).
bv  j    Av  j   Bv bv1  j   Cv bv2  j 
Dove:
c
Bv   Av  p. j bv21  j 
J 1
c
Cv  Av  p. j jbv 1  j bv  2  j 
j 1

 c
  c

c
2 2
2
Av   p. j j bv1  j    p. j jbv1  j    p. j jbv1  j bv2  j 

 j 1
  j 1

 j 1
2
2





1
2
Cumulative Correspondence Analysis as a tool for optimize factor setting in public transport
Cumulative Correspondence Analysis as a
tool for optimize factor setting in public transport
Polinomi ortogonali di Emerson:
Questi polinomi sono ortogonali rispetto alla matrice contenente sulla
diagonale principale la distribuzione marginale della variabile ordinale
1 v  v'
p. j bv  j bv '  j   

j 1
0 v  v'
c
Cumulative Correspondence Analysis as a tool for optimize factor setting in public transport
References
 Beh, E. J. (2004), Simple correspondence analysis: A bibliographic review,
International Statistical Review, 72, 257-284.
 Beh, E. J., D'Ambra, L., Simonetti B. (2011), Cumulative correspondence
analysis for ordered categorial data using Taguchi's Statistic, Communication in
Statisticcs
 Cuadras, C. M. (2002), Correspondence analysis and diagonal expansions in
terms of distribution functions, J. of Statistical Planning and Inference 103, pp.
137-150.
 D’Ambra L., Köksoy O., Simonetti B (2009) Cumulative correspondence
analysis of ordered categorical data from industrial experiments, Journal of
applied statistics, 36, 1315-1328
 Hirotsu C. (1986), Cumulative Chi-squared Statistic as a Tool for Testing
Goodness of Fit, Biometrika, 73, pp. 165-173
 Nair, V. N. (1987), Chi-squared type tests for ordered alternatives in contingency
tables, Journal of the American Statistical Association, 82, 283-291.
 Taguchi, G. (1974), A new statistical analysis for clinical data, the accumulating
analysis, in contrast with the chi-square test, Saishin Igaku, 29, 806-813.
 Warrens M. J., Heiser W. J (2009), Diagnostics for regression dependence in
tables re-ordered by the dominant correspondence analysis solution,
Computational Statistics and Data Analysis, 53, 3139-3144