Transcript Slide 1

Modelling procedures for directed network
of data blocks
Agnar Höskuldsson, Centre for Advanced Data Analysis, Copenhagen
Data structures:
Directed network of data blocks
Input data blocks
Output data blocks
Intermediate data blocks
Methods
Optimization procedures for each passage through the network
Balanced optimization of fit and prediction (H-principle)
Scores, loadings, loading weights, regression coefficients for each data block
Methods of regression analysis applicable at each data block
Evaluation procedures at each data block
Graphic procedures at each data block
1
Chemometric methods
1. Regression estimation,
X, Y. Traditional presentation: Yest=XB, and standard deviations for B.
Latent structure:
X=TP’ + X0. X0 not used.
Y=TQ’+Y0.
Y0 not explained.
2. Fit and precision.
Both fit and precision are controlled.
3. Selection of score vectors
As large as possible
describe Y as well as possible
modelling stops, when no more found (cross-validation)
4. Graphic analysis of latent structure
Score and loading plots
Plot of weight (and loading weight) vectors
2
Chemometric methods
5. Covariance as measure of relationship
X’Y for scaled data measures strength
X1’Y=0, implies that X1 is remmoved from analysis
6. Causal analysis
T=XR
From score plots we can infer about the original measurement values
Control charts for score values can be related to contribution charts
7. Analysis of X
Most time of analysis is devoted to understand the structure of X.
Plots are marked by symbols to better identify points in scor or loading
plots.
8. Model validation.
Cross-validation is used to validate the results
Bootstrapping (re-sampling from data) used to establish confidence
intervals
3
Chemometric methods
9. Different methods
Different types of data/situations may require different type of method
One is looking for interpretations of the latent structure found
10. Theory generation
Results from analysis are used to establish views/theories on the data
Results motivate further analysis (groupings, non-linearity etc)
4
Partitioning data, 1
Response
data
Measurement data
X1
X2
XL
Y1 Y2
Reference data
Z1
Z2
Z3
5
Partitioning data, 2
Instrumental data
X
X1
X2
Response data
X3
Y
engineering
Y1
Y2
quality
chemical
process
chemical results
-There is often a natural sub-division of data.
- It is often required to study the role of a sub-block
- Data block with few variables may ’disappear’ among one with many variables,
e.g. Optical instruments often give many variables.
6
Path diagram 1
X1
X4
X6
X2
X7
X5
X3
Examples:
Production process
Organisational data
Diagram for sub-processes
Causal diagram
7
Path diagram 2, schematic application of modelling
X1
X4
x10
X6
X2
x20
X7
X5
X3
Resulting estimating equations
x30
x10 is a new sample from X1,
x20 is a new one from X2,
x30 is a new one from X3,
X4,est=X1B14+X2B24+X3B34
how do they generate new
samples for X4, X5, X6 and X7?
X7,est=X6B67
X5,est=X1B15+X2B25+X3B35
X6,est=X4B46+X5B56
8
Path diagram 3
Time t1
Time t2
X1
X4
X6
X2
X7
X5
X3
Data blocks can be aligned to time.
Modelling can start at time t2.
9
Notation and schematic illustrations
Instrumental data
Response data
w

X
Y
t
q
u
w: weight vector (to be found)
t: score vector, t = Xw =w1x1 + ... + wKxK
q: loading vector, q =YTt = [ (y1Tt), ... , (yMTt) ]
u: Y-score vector, u=Yq = q1 y1 + ... + qM yM
Adjustments:
XX – t pT/(tTt)
YY – t qT/(tTt)
Vectors are collected into matrices, e.g., T=(t1, ... , tA)
10
Conjugate vectors 1
w
r: t=Xw, p=XTt. paTrb=0 for ab.
X
t
p
r
X
t
r: t=Xq, qaTrb=0 for ab.
q
r
w
s
v
r and s:
X
t
p
r
t=Xw, p=XTv,
paTrb=0, taTsb=0 for ab.
11
Conjugate vectors 2
The conjugate vectors R=(r1, r2, ..., rA) satisfy: T=XR.
Latent structure solution:
X = T PT + X0, where X0 is the part of X that is not used
Y = T QT + Y0, where Y0 is the part of Y that could not be explained
Y = T QT + Y0= X (R QT) + Y0= X B + Y0,
for B= R QT
The conjugate vectors are always computed together with the score vectors.
When regression on score vectors has been computed, the regression on the
original variables is computed as shown.
12
Optimization procedure, 1
w1
One data block:
|t1|2 max
X1
t1
w1
Two data blocks:
X1

t1
|q2|2 max
X2
q2
13
Three data blocks
w
Start
X
Z
Y
t
tz
ty
qz
qy
X basis
Y estimated
Y basis
|qz|2 max
Z estimated
w
X1
X2
t1
X3
q2
Adjustments:
t1 describes X1:
X1X1-t1p1T/(t1Tt1), p1=X1Tt1.
t1 describes X2:
X2X2-t1q2T/(t1Tt1), q2=X2Tt1.
X4
t3
t4
q4
q2 describes X3: X3X3-t3q2T/(q2Tq2), t3=X3q2.
t3 describes X4:
X4X4-t3q4T/(t3Tt3), q4=X4Tt3.
14
Optimization procedure, 2
Two input and two output data blocks:
w1
X3
X1
w 2 t1
X4
X2
t2
Find w1 and w2:
q13
q23
|q13+q23+q14+q24|2  max
q14
q24
Two input, one intermediate and one output data blocks:
w1
X1
Find w1 and w2:
w 2 t1
X2
t2
X3
X4
q13
q23
q134
q234
|q134+q234|2  max
15
Balanced optimization of fit and prediction
(H-principle)
In linear regression we are looking for a weight vector w,
so that the resulting score vector t=Xw is good!
X
Y
Linear regression
The basic measure of quality is the prediction variance for a sample, x0.
Assuming negligible bias it can be written (assuming standard assumptions)
F(w) = Var(y(x0)) = k[1 – (yTt)2/(tTt)][1 + t02/(tTt)].
It can be shown that F(cw)=F(w) for all c>0. Choose c such that (tTt)=1. Then
F(w) = k[1 – (yTt)2][1 + t02].
In order to get a prediction variance as small as possible, it is natural to choose
w such that (yTt)2 becomes as large as possible,
maximize (yTt)2 = maximize |q|2
(PLS regression)
16
Optimization procedure, 3
Weighing along objects (rows) (same algorithm, but using the transposes):
v1
X1
p1
Task: find weight vector v1:
maximize |t2|2
X2
t2
v1
Task: find weight vector v1:
maximize |q3|2
X1
p1
X2
X3
t2
q3
17
Optimization procedure, 4
w1
X1
Task: find weight vector w1:
maximize |q3|2,
t1
where
p1
X2
X3
t2
Regression equations
q3
q3=X3Tt2
=X3TX2p1
=X3TX2X1Tt1
=X3TX2X1TX1w1
If p1 is a good weight vector for X2, a
good result may be expected.
X3,est=X2B23
X2,est=B12X1
X1,est=X1B11
Pre-processing may be needed to
find variables in X1 and in X2 that are
highly correlated to each other.
18
Three types of reports
Reports:
Xi
How a data block is doing in a network
How a data block can be described by
data blocks that lead to it.
Xi
Xi-1
Xi
Xi-2
Xi-3
How a data block can be described by
one data block that leads to it.
Xi-2
Xi
19
Production data, 1
X1
X2
X1: Process parameters,
8 variables
X2: NIR data,
1560 variables
(reduced to 120)
X1 ’disappears’ in
the NIR data X2.
Y
No
1
2
3
4
5
6
7
8
9
10
11
12
|X2|2
78,961
91,538
96,351
97,942
98,620
98,967
99,205
99,294
99,349
99,426
99,606
99,657
|Y|2
51,483
67,559
76,291
81,383
83,900
85,705
87,917
90,472
92,183
92,947
93,084
93,376
|X|2
74,969
86,786
91,627
95,373
95,919
97,054
97,508
97,990
98,667
98,896
99,103
99,202
|Y|2
51,964
69,553
80,643
85,058
89,056
90,050
91,990
93,455
94,020
94,708
95,082
95,740
20
Production data, 2
At each step:
w1
X1
t1
w2
Results for X2,
process parameters:
5 score vectors
explain 11.92% of Y.
Y
X2
t2
At each step the score
vectors are evaluated.
Non-significant ones are
excluded.
Results for X1,
NIR data:
12 score vectors
explain 84.141%
of Y.
Total 96.06%=11.920%+84.14%
is explained of Y.
No
Step
|Y|2
1
1
4,957
2
2
9,315
3
5
10,393
4
6
10,929
5
8
11,920
No
Step
|Y|2
1
1
51,483
2
2
69,121
3
3
73,070
4
4
76,506
5
5
78,669
6
6
80,923
7
7
82,129
8
8
82,552
9
9
83,132
10
10
83,590
11
11
83,881
12
12
84,141
21
Production data, 3
R2-values:
X2
Plot of estimated versus observed quality variable
using only score vectors for process parameters.
96.06%
75.12%
Y
0.3
0.2
X1
87.75%
0.1
0
-0.1
The process parameters
contribute marginally by
11.92%. But if only they were
used, they would explain
75.12% of the variation of Y.
-0.2
R2=0.7512
-0.3
-0.4
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
22
Directed network of data blocks
Input blocks
Intermediate blocks
...
Output blocks
...
...
Give weight vectors
for initial score vectors
Are described by previous
blocks and give score vectors
for succeeding blocks
Are described by
previous blocks
23
Magnitudes computed between two data blocks
Xk
Xi
Measures of precision
Different views:
a) As a part of a path
b) If the results are viewed
marginally
c) If only XiXk
Measures of fit
...
Ti: Score vectors
Qi: Loading vectors
Bi: Regression coefficients
Etc
24
Stages in batch processes
Time
Batches
X1
X2
Xk
Stages
1
2
K
Paths:
Y
Final quality
X1  X2  ...  XK  Y
Given a sample x10, the path model
gives estimated samples for later blocks
[X1 X2 X3]  X4  Y
Given values of (x10 x20 x30), estimates
for values of x4 and y are given.
[X1 X2 X3]  [X4 X5]  Y
Given values of (x10 x20 x30), estimates
for values of (x4 x5) and y are given.
25
Schematic illlustration of the modelling task for
sequential processes
Stages
X1
X2
Now
X3
X4
Y
Known process
Later stages
parameters
Initial
Next stage
conditions
26
Plots of score vectors
X1
X2
t1
X1
XL
t2
X1 – X2
t2
tL
X1 – XL
tL
t1
t1
The plots will show how the changes are relative to the first data block.
27
Graphic software to specify paths
X4
X5
X1
X3
X2
...
XL
Blocks are dragged into the screen. Relationships specified.
28
Pre-processing of data
• Centring. If desired centring of data is carried out
• Scaling. In the computations all variables are scaled to unit length (or
unit standard deviation if centred). It is checked if scaling disturbs the
variable, e.g. if it is constant except for two values, or if the variable is at
the noise level. When analysis has been completed, values are scaled
back so that units are in original values.
• Redundant variable. It is investigated if a variable does not contribute
to the explanation of any of the variables that the presnt block lead to. If it
is redundant, it iseliminated from analysis.
• Redundant data block. It is investigated if a data block can provide
with a significant description of the block that it is connected to later in the
network. If it can not contribute to the description of the blocks, it is
removed from the network.
29
Post-processing of results
Score vectors computed in the passages through the network are evaluated in
the analysis at one passage. Apart from the input blocks the score vectors
found between passages are not independent. The score vectors found in a
relationship XiXj are evaluated to see if all are significant or some should be
removed for this relationship.
Cross-validation like in standard regression methods
Confidence intervals for parmeters by resampling technique
30
International workshop on
Multi-block and Path Methods
24. – 30. May 2009, Mijas, Malaga, Spain
31