Basic principles of probability theory

Transcript Basic principles of probability theory

Proximity matrices and scaling
•
•
•
•
•
Purpose of scaling
Classical Euclidean scaling
Non-Euclidean scaling
Non-Metric Scaling
Example
Proximity matrices
There are many situations when dissimilarities (or similarities) between individuals are
measured. Purpose is to find relative orientation (in some sense) of the individuals. If
dissimilarities are ordinary Euclidean distances then purpose could be finding relative
coordinates of the individuals. In general dissimilarity matrix is called proximity matrix.
I.e. how close are individuals to each other.
Euclidean distances are simplest of all and they relatively easy to work with. Distances are
metric if for all triples (triangular inequality) (i,j,k):
 ij   jk   ik
Matrix with elements of these distances is denoted as . This distance matrix is said to be
Euclidean if n points corresponding to this matrix can be embedded in an Euclidean
space so that distances between the points in the Euclidean space is equal to the
corresponding elements of the matrix. If Euclidean space is n-dimensional then distance
between two points is calculated (it can of course be generalised to include weights):
n
d
k
ij

 ( x ik  x jk )
2
k 1
If distances correspond to Euclidean distances then there is an elegant solution to the problem
of finding configuration from the distance matrix.
Metric scaling
Suppose that we have an nxn matrix of pairwise distances that are metric. Denote this matrix and
its elements with , ij. We want to find n points in k dimensional Euclidean space so that
distances between these points are equal to the elements of . Denote nxk matrix of points
by X. Let us define Q=XXT. Then for distances and elements of Q we can write:
k
d
2
rs
 (x

k
 x sj ) , q rs 
2
rj
j 1
x
rj
x sj
j 1
D is the matrix with elements – square of drs. Then between elements of D and Q we can write
the following relation:
d rs  q rr  q ss  2 q rs
2
We can find the positions of the points nif we assume that centroid of the elements of X is 0, i.e.
x
rj
0
r 1
Then we can write the following relations:
n
n
n
n
 d   q  nq  2  q   q  nq
Here we used the fact that centroid of the points is 0. Furthermore we can write:
2
rs
rr
r 1
ss
rs
r 1
n
n
d
2
rs
r 1 s 1
rr
r 1
ss
r 1
n
n
n
r 1
s 1
s 1
 n  q rr  n  q ss  2n  q rr
Using these identities we can express diagonal elements of the Q using elements of D:
n
q
r 1
rr

1
n
n
d
2n
r 1 s 1
2
rs
,
q ss 
1
n
d
n
r 1
2
rs

n
1
2n
2
n
d
r 1 t 1
2
rt
Metric scaling: Cont.
Let us denote matrix of diagonal elements of Q by E=diag(q11,q22,,,,qnn). Then relation between
elements of D and Q could be written in a matrix form:
D  11 E  E 11  2 Q
If we will use the relation between diagonal terms of Q and elements of D we can write:
T
T
11 E  11 D / n  11 D 11 /( 2 n )
T
T
T
T
2
E 11  D 11 / n  11 D 11 /( 2 n )
T
T
T
T
2
Then we can write relation between Q and D:
( I  11 / n ) D ( I  11 / n )   2 Q   2 XX
T
T
T
Thus we see that if we have the matrix of dissimilarities  with elements of qij we can find matrix
Q using the relation:
Q  ( I  11 / n )  ( I  11 / n ) ,   
T
T
1

2
Since  is Euclidean then matrix Q is positive semi-definite and symmetric. Then we can use
spectral decomposition of Q (decomposition using eigenvalues and eigenvectors):
Q  T T
If Q is positive semi-definite then all eigenvalues are non-negative. We can write (recall form of
the Q=XXT):
T
Q  T
1/ 2
( T )
1/ 2
T
 X  T
1/ 2
This gives matrix of n coordinates (principal coordinates). Of course configuration is not unique.
Metric scaling: Cont.
Algorithm for metric scaling (finding principal coordinates) can be written as follows:
1)
Given matrix of dissimilarities form the matrix  with elements of –1/2ij
2)
Subtract from each elements of this matrix average of raw and column elements where it
is located. Denote it by 
3)
Find the eigenvalues of this matrix and corresponding eigenvectors of . The
dimensionality of the representation corresponds to the number of non-zero eigenvalues
of this matrix.
4)
Normalise eigenvectors so that:
 i  i   i , where  - s are eigenvalue
T
s and  - s are eigenvecto
rs
Coordinates of n points on i-th coordinate are given by elements of i-th renormalised
eigenvector.
If weights attached to the points then this algorithm can be modified to take it into account.
Then modification is needed. Define s as a vector of weights, Dw diagonal matrix
formed by weights. Then we can use:
1)
Form 
2)
Calculate =(I-1sT) (I-1sT)
3)
Find eigenvalues and eigenvectors of (Dw)1/2  (Dw)1/2
4)
Normalise the eigenvectors so that scalar products are equal to eigenvalues. Coordinates
of n points are then given by the elements of (Dw)-1/2i
Dimensionality
“Goodness of fit” of k dimensional configuration to the original matrix of dissimilarity is
measured using “per cent trace” defined as:
k
k 

i
i 1
n
100 %

i
i 1
Note that principal coordinate and principal component analysis are similar in some sense. If X is
nxp data matrix and we have nxn dissimilarity matrix then principal component scores
coincide with the coordinates calculated using scaling.
It might happen that dissimilarity matrix is not Euclidean. Then some of the calculated
eigenvalues may become negative and the coordinate representation may include
imaginary coordinates. If these eigenvalues are small then they can be ignored and set to 0.
Another way of avoiding this problem is finding best possible approximation of the
dissimilarity matrix by Euclidean dissimilarity matrix.
If some eigenvalues are negative then goodness of fit can be modified in two different ways:
k
 

k
i
i 1
n
 |
i 1
100 %,
i
|
or  

2
i

2
i
i 1
k
i 1
100 %
Other types of scaling
Classical scaling is one of the techniques used to find configuration from the dissimilarity matrix.
There are several other techniques. Although they do not have direct algebraic solution
with modern computer they can be implemented and used. One of these techniques is
minimisation:
 ' '    (  d )
n
n
2
ij
i 1
ij
j 1
There is no algebraic solution exists. In this case initial dimension is chosen then starting
configuration is postulated and then this function is minimisied. If dimension is changed
then whole procedure is repeated again. Then values of the function at the minimum for
different dimensions can be used for scree-plot to decide dimensionality.
One of the techniques uses following modification of the observed distances:
dˆ ij  a  b  ij
Then it used in two types of functions. First of them is the standardised residual sum of squares:
STRESS

 
 i
 (d
ij
2
 dˆ ij ) / 
j
i

j

2
d ij 

1/ 2

d 

1/ 2
Another one is the modified version of this:
SSTRESS

 
 i
 (d
j
2
ij
 dˆ ) / 
2
ij
2
i

j
4
ij
Both these functions must be minimised iteratively using of one the optimisation techniques.
Non-metric scaling
Although Euclidean and metric scaling can be applied for wide range of the cases there might be
cases then requirement of these techniques might not be satisfied. For example:
1)
Dissimilarity of the objects may not be true distance but only order of similarity. I.e. we
can only say that for a certain objects we should have (M=1/2n(n-1))
i
1 j1
  i2 j2  ...   i M
jM
2)
3)
Measurement may have such large error that we can be sure only on order of distances
When we use metric scaling we assume that true configuration exists and what we measure
is an approximation for interpoint distances for this configurations. It may happen that we
measure some function of the interatomic distances.
These cases are usually handled using non-metric scaling. One of the techniques is to use the
function STRESS with constraints:
 ij   rs ,  dˆij  dˆ rs , for all i , j , r , s
This technique is called non-metric to distinguish it from previously described techniques.
Example: U.K. distances
Here is table of some of the intercity distances:
Bristol Cardiff Dover Exeter Hull Leeds London
Bristol
0
Cardiff 47
0
Dover 210
245
0
Exeter 84
121
250
0
Hull
231
251
266
305
0
Leeds 220
240
271
294
61
0
London 120
155
80
200 218 199
0
Example:
Result of metric scaling. Two dimensional coordinates:
[,1]
[,2]
[1,] -66.96555 42.04003
[2,] -76.31011 67.37324
[3,] -21.27937 -162.65703
[4,] -136.59765 58.13424
[5,] 163.00725 31.67896
[6,] 152.04793 40.75884
[7,] -13.90250 -77.32827
Example: Plot
Two dimensional plot gives. In this case we see that it gives representation of the UK
map.
50
Cardiff
Exeter
Bristol
-50
-100
London
-150
y
0
Leeds
Hull
Dover
-100
-50
0
50
100
150
R commands for scaling
Classical metric scaling is in the library mva
library(mva)
cc = cmdscale(d, k = 2, eig = FALSE, add = FALSE, x.ret = FALSE)
d is the distance matrix, k is the number of the dimensions required.
You can plot using
plot(cc)
or using the following set of commands:
x = cc[,1] (or x = cc$points[,1])
y = cc[,2] (or y = cc$points[,2])
plot(x,y,main(“Metric scaling results”)
text(x,y,names(cc))
It is a good idea to have a look if number of dimension requested is sufficient.
It can be done by requesting eigenvalues and comparing them
R commands for scaling
Non-metric scaling can be done using isoMDS from the library(MASS)
library(MASS)
?isoMDS
then you can use
cc1 = isoMDS(a,cmdscale(a,k),k=2).
The second argument is for the initial configuration. Then we can plot using
x = cc1$points[,1]
y = cc1$points[,2]
plot(x,y,main=“isoMDS scaling”)
text(x,y,names(a))
R commands for scaling
If you have data matrix x then you can calculate distances using the command dist
dist(x,method=“euclidean”)
then result of this command can be used for analysis with cmdscale or isoMDS.

Basic principles of probability theory

Transcript Basic principles of probability theory

Directory