Exercizes Layout

Download Report

Transcript Exercizes Layout

Chemometric functions in Excel
Oxana Rodionova & Alexey Pomerantsev
Semenov Institute of Chemical Physics
[email protected]
01.12.08
1
Distance Learning Course in Chemometrics
for Technological and Natural-Science
Mastership Education
• Unfulfilled need in chemometric education in Russia
• Low number of qualified specialists in chemometrics
• Large distances, e.g. Moscow – Barnaul is about 3000 km
• No modern chemometrics books in Russian
• No available chemometric software
• No support from officials: government, Academy, etc
• Easy available everywhere => INTERNET
• Interactive layout: all calculations should be clear and repeatable
Barnaul
• Web friendly environment for the calculations => EXCEL
01.12.08
2
• Necessity to make and use our own (free) software => EXCEL Add-In
Chemometric
calculations in
Excel
Input
Provides user with all
possibilities of Excel
interface, worksheet
calculations, worksheet
functions, charts, etc.
R esults
VBA helps to simplify routine
work
VBA
Functions
D ATA
Excel User
Interface
С++
DLL
01.12.08
Calculations
All calculations are made "on
the fly“ and very fast
3
Installation
http://rcs.chph.ras.ru/down/sacs.zip
Chemometrics.dll 
Chemometrics. xla 
put in your Windows folder
put in the AddInn folder
(C:\Documents and Settings\
<User>\Application Data\
Microsoft\AddIns\)
(C:\WINDOWS\)
Load Chemometrics.xla by < Excel Options>  <Add-Ins> in the
open Workbook
01.12.08
4
Matrix calculations in Excel
={TRANSPOSE(B6:F10)}
B6:F10
Ctrl-Shift-Enter
Barr
01.12.08
={MMULT(B6:F10,TRANSPOSE(Barr))}
5
Principal Component Analysis (PCA)
Initial
data
Score
matrix
Loading
matrix
× A
J
I
X
=
I
PT
P
J
T
Error
matrix
+ I
E
A
J
A
J
X=TPT+E
01.12.08
6
Chemometrics XLA. PCA Scores
Xcal
Xtst
Centering
AND/OR
weighting
={ScoresPCA(Xcal,5,1,Xtst)}
01.12.08
nPC
7
Chemometrics XLA. PCA Loadings
Xcal
=TRANSPOSE(LoadingsPCA(Xcal,5,1))}
Excel worksheet function
01.12.08
nPC
Centering
AND/OR
weighting
8
List of chemometric functions
PCA
ScoresPCA <for calibration or test samples>
LoadingsPCA
Options:
• Centering
PLS
ScoresPLS <X-scores for calibration or test samples>
UScoresPLS <Y-scores for calibration or test samples>
LoadingsPLS <P-loadings>
WLoadingsPLS
QLoadingsPLS
AND/OR
scaling
• Number of PCs
PLS2
ScoresPLS2 <X-scores for calibration or test samples>
UScoresPLS2 <Y-scores for calibration or test samples>
LoadingsPLS2 <P-loadings>
WLoadingsPLS2
QLoadingsPLS2
01.12.08
9
ScoresPCA
X data (calibration set)
ScoresPCA (rMatrix [, nPCs] [,nCentWeightX] [, rMatrixNew] )
Number of PC (A)
Test set
centering and/or scaling
1 centering
2 scaling
3 both
X[IJ]  T[I A]
01.12.08
10
Validation Rules
If rMatrixNew is omitted then only calibration scores are calculated
If rMatrixNew is specified then only test scores are calculated
If rMatrixNew coincides with rMatrix then cross-validation is calculated
10% -out
crossvalidation
01.12.08
11
LoadingsPCA
X data (calibration set)
LoadingsPCA (rMatrix [, nPCs] [,nCentWeightX])
Number of PC (A)
centering and/or scaling
1 centering
2 scaling
3 both
X[IJ]  P[J A]
01.12.08
12
Explorative Data Analysis
Case study 1: People
01.12.08
13
People
01.12.08
14
Dataset in Excel Workbook (People.xls)
Number of objects (n) = 32
Number of variables (m) = 12
01.12.08
15
Data Preprocessing
Aim: to transform the data into the most
suitable form for data analysis
01.12.08
16
Autoscaling
mean centering
+
scaling
=
autoscaling
01.12.08
17
People: Scores & Loadings (PC1 vs. PC2)
4
0.6
Region
t2
2
MS
MS
MS
MS MS
FS
0.3
MS
MS
FS
FS
0.0
MN
MN
MN
MN
FN
FN
FN
FN
-2
-4
-2
FN
FN FN
MN
FN
0
22
“Map of Samples”
01.12.08
Weight
Strength
t1
FS
0
Height
Shoes
IQ
MS
FS
FS
FS
FS
a)
P2
Wine
P1
Age
Hairs
MN
MN
MN
Sex
44
66
-0.3
-0.4
Income
-0.2
0.0
0.2
Beer
0.4
“Map of Variables”
18
People: Scores & Loadings (PC1 vs. PC3)
3
0.4
t3t3
2
FS
20
18
1
1
30
0
FS
FS
FS
-1
-1
42
41
FN
40
FN
FN
FN
-2
0.0
t1
t1
37
37
-2
FN
FN
FN
MN
FN
-0.2
MN
-0.4
MN
46
MN
48
MN
Income
-0.6
0
502
4
0
2
4
Score plot
01.12.08
Age
6
55
-2
P1
Wine
43
MN
MN
-3
-3 -4
Hairs
Region
49
-4
Strength
Weight
Shoes
Sex
33
36
Height
IQ
27
MN
40
41
0.2
35
30
32
36
MS
33
32
FS
MS
MS
FS
Beer
26
26
24
27
FS
21
FS
MS
MS MS
23
24
a)
P3
MS
MS
6
-0.8
-0.4
-0.2
0.0
0.2
0.4
Loading plot
19
Case study 2: HPLC-DAD
01.12.08
20
Measurements
AU
1.2
1.0
0.8
0.6
0.4
0.2
220
249
wa
v
01.12.08
0.0
277
el
306
en
g
334
th
1
5
9
13
17
21
25
29
ti me
21
Dataset in Excel Workbook
X(3028)
01.12.08
22
Pure compounds A and B
1.2
A
C (t )
1.0
A
AU
B
B
If we observe X can we
predict C and S ?
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
l, nm
0.0
0
5
10
15
time
20
25
30
0.0
220
240
260
280
300
320
340
X=CST+E
01.12.08
23
Score plot
A
t2
12
11
13
10
14
15
9
16
2019
2
1
22
23
24
25
26
29
28
27
30
18
17
8
t1
1
2
7
1.2
A
C (t )
B
3
1.0
6
0.8
0.6
4
0.4
5
0.2
0.0
01.12.08
0
5
10
15
time
20
25
30
B
24
Conclusions from the Score Plot
1. Linear regions = Pure compounds
2. Curved line= Co-elution
3. Closer to the origin = Lower intensity
4. Number of bends = Number of different compounds
01.12.08
25
Factor analysis vs. PCA analysis
J
X
2
×
J
A
×
ST
I
C
01.12.08
X
I
=
=
I
I
J
J
PT
T
+
+
E1
E2
26
Scores and Loadings
3.2
A
C, T
1.0
A
B
S, P
B
t1
t2
p1
0.8
p2
2.2
0.6
0.4
1.2
0.2
0.0
0.2
220
0
5
10
15
-0.8
25
30
260
280
300
320
340
-0.2
-0.4
time
01.12.08
20
240
wave length
27
Procrustes transformation
X ≈ CST
X ≈ TPT
I = RRT = Identity matrix
X ≈ T(RRT)PT = (TR)(PR)T
^ ≈ TR
^ ≈ PR
C
S
R = Rstretch ×Rrotation
01.12.08
28
Scores Transformation
t2
t2
4
15
17 16
18
1
9
20
21
22
23
26
25
24
28
29
30
27
14
11
12
13
10
3
9
5
6
7
8
t1
1
2
2
8
27
30
29
28
26
1
25
24
23
22
21
20
19
18
9
17
16
15
14
13
10
1211
7
3
6
4
t1
Stretching
5
t5
4 2
6
7
3
8
9
10
11
2
12
17
16151413
30
29
28
27
26
25
24
23
121
22
20
19
18
01.12.08
t1
29
Procrustes analysis results
1.2
1.0
C hat(t )
C (t )
1.0
S(l)
1.2
S hat(l)
0.9
1.0
0.8
1.0
0.8
A
0.7
0.8
0.6
A
Ahat
0.6
0.5
Ahat
0.4
Bhat
0.4
0.8
Bhat
B
0.6
B
0.6
0.4
0.4
0.3
0.2
0.2
0.2
0.2
0.1
0.0
0.0
0
5
10
15
time
01.12.08
20
25
30
0.0
0.0
220
240
260
280
wavelength
300
320
340
l, nm
30
Conclusions
1. Scaling and centering is problem dependent
2. In this example number of PCs = Number of
different compounds
01.12.08
31
Regression
01.12.08
32
Principal Component Regression (PCR)
T
...
X
t1
1) PCA
tA
t
P
...
p1
2) MLR
01.12.08
t
pA
T  a = y+ e
33
Projection on Latent Structures (PLS)
X
P
...
t
p1
t
pA
T
...
U...
t1
u1
tA
Y
uA
t
q1
...
Qq
t
A
t
W
01.12.08
...
w1
t
wA
34
Projection on Latent Structures (PLS)
T B = Y + e
01.12.08
35
PLS and PLS2
1
1
1
PLS
T b = y + e
M
M
M
PLS2
01.12.08
T B = Y + E
36
ScoresPLS
X data (calibration set)
Y data (calibration set)
ScoresPLS (rMatrixX, rMatrixY
[, nPCs] [, nCentWeightX] [, nCentWeightY] [, rMatrixXNew])
Number of PC (A)
centering and/or scaling of X
1 centering
2 scaling
3 both
01.12.08
X Test set
centering and/or scaling of Y
1 centering
2 scaling
3 both
X[IJ], Y[I1]  T[IA]
37
UScoresPLS
X data (calibration set)
Y data (calibration set)
UScoresPLS (rMatrixX, rMatrixY
[, nPCs] [, nCentWeightX] [, nCentWeightY] [, rMatrixXNew] [, rMatrixYNew])
Number of PC (A)
centering and/or scaling of X
1 centering
2 scaling
3 both
01.12.08
X Test set
Y Test set
centering and/or scaling of Y
1 centering
2 scaling
3 both
X[IJ] , Y[I1]  U[I A]
38
WLoadingsPLS
X data (calibration set)
Y data (calibration set)
WLoadingsPLS (rMatrixX, rMatrixY
[, nPCs] [, nCentWeightX] [, nCentWeightY])
Number of PC (A)
centering and/or scaling of X
1 centering
2 scaling
3 both
01.12.08
centering and/or scaling of Y
1 centering
2 scaling
3 both
X[IJ] , Y[I1]  W[J A]
39
LoadingsPLS
X data (calibration set)
Y data (calibration set)
LoadingsPLS (rMatrixX, rMatrixY
[, nPCs] [, nCentWeightX] [, nCentWeightY])
Number of PC (A)
centering and/or scaling of X
1 centering
2 scaling
3 both
01.12.08
X[IJ] , Y[I1]  P[JA]
centering and/or scaling of Y
1 centering
2 scaling
3 both
40
QLoadingsPLS
X data (calibration set)
Y data (calibration set)
QLoadingsPLS (rMatrixX, rMatrixY
[, nPCs] [, nCentWeightX] [, nCentWeightY])
Number of PC (A)
centering and/or scaling of X
1 centering
2 scaling
3 both
01.12.08
X[IJ], Y[I1]  Q[1 A]
centering and/or scaling of Y
1 centering
2 scaling
3 both
41
ScoresPLS2
X data (calibration set)
Y data (calibration set)
ScoresPLS2 (rMatrixX, rMatrixY
[, nPCs] [, nCentWeightX] [, nCentWeightY] [, rMatrixXNew])
Number of PC (A)
centering and/or scaling of X
1 centering
2 scaling
3 both
01.12.08
X Test set
centering and/or scaling of Y
1 centering
2 scaling
3 both
X[IJ], Y[IK]  T[I A]
42
UScoresPLS2
X data (calibration set)
Y data (calibration set)
UScoresPLS2 (rMatrixX, rMatrixY
[, nPCs] [, nCentWeightX] [, nCentWeightY] [, rMatrixXNew] [, rMatrixYNew])
Number of PC (A)
centering and/or scaling of X
1 centering
2 scaling
3 both
01.12.08
X Test set
Y Test set
centering and/or scaling of Y
1 centering
2 scaling
3 both
X[IJ], Y[IK]  U[I A]
43
WLoadingsPLS2
LoadingsPLS2
X data (calibration set)
QLoadingsPLS2
Y data (calibration set)
LoadingsPLS2 (rMatrixX, rMatrixY
[, nPCs] [, nCentWeightX] [, nCentWeightY])
Number of PC (A)
centering and/or scaling of X
1 centering
2 scaling
3 both
01.12.08
centering and/or scaling of Y
1 centering
2 scaling
3 both
X[IJ], Y[IK]  P[J A] or W[J A] or Q[K A]
44
Seventh Winter Symposium on Chemometrics
near Tula city, February 2010
100
km
01.12.08
45