Transcript 0.3

CoDaPack:
A tool for Compositional
Data Analysis
M. Comas-Cufí & S. Thió-Henestrosa
([email protected])
Dept. Computer Sciences and Applied Mathematics
University of Girona (UdG)
Catalonia-Spain
1
What’s coda?
• Vector x=[x1, x2,…, xD]
• Add to a constant: 100, 1, 106, 109, …

Units: percentage, part per one, ppm, ppb, …
• Has positive elements
• Carry only relative information
• Examples
– Production (pieces): [Ok, NonOk, Rework] = [87, 1,
12]
– Household budget (€): [Food, Serv., Other] = [1150, 623, 351]
– Daily activities (h): [Work, Sleep, Other] = [7.5,
7.5, 9]
2
Sample space of coda: simplex
• Compositional data live in the simplex (S) represented
in ternary (D=3), quaternary (D=4), … diagram
D=3
S3
x = [0.45,0.35,0.2]
x=[0.2,0.25,0.2 ,0.35]
D=4
S4
3
Euclidean distance
appropriate?
B
A
STOP
PROD.
HALF
PROD.
NON-STOP
PROD.
A2009 = [0.2, 0.1, 0.7]
A2010 = [0.1, 0.2, 0.7]
STOP
PROD.
HALF
PROD.
NON-STOP
PROD.
B2009 = [0.4, 0.3, 0.3]
B2010 = [0.3, 0.4, 0.3]
A2010 - A2009 = B2010 - B2009 = [-0.1, 0.1, 0]
de(A)=de(B)=0.14  measures the absolute difference
4
Euclidean distance
appropriate?
B
A
STOP
PROD.
HALF
PROD.
NON-STOP
PROD.
0.1
0.2 0.2
2009 0.1
0.7
2010 0.1 0.2
0.7
2009  2010
STOP
PROD.
HALF
PROD.
0.4
0.3
0.4
0.3
0.3
Factory A
0.4
0.3
0.3
Factory B
Stop Prod
-50%
-25%
Half Prod
+100%
+33.3%
0%
0%
Non-Stop Prod
NON-STOP
PROD.
5
Euclidean distance
appropriate?
STOP
PROD.
Our interest lies on relative values
A2010/A2009=[1/2, 2, 1]
B2010/B2009=[3/4, 4/3, 1]
Euclidian distance:
de(A) = de(B) = 0.14
B2009
B2010
A2009
A2010
HALF
PROD.
NON-STOP
PROD.
Aitchison distance:
da(A)=0.6276
da(B) = 0.3970
6
Classical multivariate normal
model appropriate?
7
Log-ratio methodology
• Aitchison geometry to CODA is
equivalent to classical euclidean
geometry to log-ratio values.
Simplex (restricted space) 
[x1,…,xD]
Real space (non restricted)
log(xi/xj), i,j = 1,…,D, j ≠ i
8
CoDaPack 2
9
Software
• CoDaPack: software developed by the Departament of
Computer Science and Applied Mathematics in the
Universitat de Girona. Easy and intuitive.
http://ima.udg.edu/codapack [email protected]
• compositions (R-package): analysis of compositional and
positive data using different approaches.
http://cran.r-project.org/
[email protected]
• robCompositions (R-package): robust estimation for
compositional data
http://cran.r-project.org/
[email protected]
10
References
• Aitchison, J., 1986. The Statistical Analysis of Compositional
Data. Chapman & Hall, London. Reprinted in 2003 with
additional material byBlackburn Press.
• Proceedings of CoDaWork, 2003-2005-2008-2011: available in
http://dugi-doc.udg.edu/handle/10256/150.
• CoDaWeb:
Compositional
Data
http://www.compositionaldata.com/
Analysis
Web
Site:
11