Transcript Slide 1
Modelling procedures for directed network of data blocks Agnar Höskuldsson, Centre for Advanced Data Analysis, Copenhagen Data structures: Directed network of data blocks Input data blocks Output data blocks Intermediate data blocks Methods Optimization procedures for each passage through the network Balanced optimization of fit and prediction (H-principle) Scores, loadings, loading weights, regression coefficients for each data block Methods of regression analysis applicable at each data block Evaluation procedures at each data block Graphic procedures at each data block 1 Chemometric methods 1. Regression estimation, X, Y. Traditional presentation: Yest=XB, and standard deviations for B. Latent structure: X=TP’ + X0. X0 not used. Y=TQ’+Y0. Y0 not explained. 2. Fit and precision. Both fit and precision are controlled. 3. Selection of score vectors As large as possible describe Y as well as possible modelling stops, when no more found (cross-validation) 4. Graphic analysis of latent structure Score and loading plots Plot of weight (and loading weight) vectors 2 Chemometric methods 5. Covariance as measure of relationship X’Y for scaled data measures strength X1’Y=0, implies that X1 is remmoved from analysis 6. Causal analysis T=XR From score plots we can infer about the original measurement values Control charts for score values can be related to contribution charts 7. Analysis of X Most time of analysis is devoted to understand the structure of X. Plots are marked by symbols to better identify points in scor or loading plots. 8. Model validation. Cross-validation is used to validate the results Bootstrapping (re-sampling from data) used to establish confidence intervals 3 Chemometric methods 9. Different methods Different types of data/situations may require different type of method One is looking for interpretations of the latent structure found 10. Theory generation Results from analysis are used to establish views/theories on the data Results motivate further analysis (groupings, non-linearity etc) 4 Partitioning data, 1 Response data Measurement data X1 X2 XL Y1 Y2 Reference data Z1 Z2 Z3 5 Partitioning data, 2 Instrumental data X X1 X2 Response data X3 Y engineering Y1 Y2 quality chemical process chemical results -There is often a natural sub-division of data. - It is often required to study the role of a sub-block - Data block with few variables may ’disappear’ among one with many variables, e.g. Optical instruments often give many variables. 6 Path diagram 1 X1 X4 X6 X2 X7 X5 X3 Examples: Production process Organisational data Diagram for sub-processes Causal diagram 7 Path diagram 2, schematic application of modelling X1 X4 x10 X6 X2 x20 X7 X5 X3 Resulting estimating equations x30 x10 is a new sample from X1, x20 is a new one from X2, x30 is a new one from X3, X4,est=X1B14+X2B24+X3B34 how do they generate new samples for X4, X5, X6 and X7? X7,est=X6B67 X5,est=X1B15+X2B25+X3B35 X6,est=X4B46+X5B56 8 Path diagram 3 Time t1 Time t2 X1 X4 X6 X2 X7 X5 X3 Data blocks can be aligned to time. Modelling can start at time t2. 9 Notation and schematic illustrations Instrumental data Response data w X Y t q u w: weight vector (to be found) t: score vector, t = Xw =w1x1 + ... + wKxK q: loading vector, q =YTt = [ (y1Tt), ... , (yMTt) ] u: Y-score vector, u=Yq = q1 y1 + ... + qM yM Adjustments: XX – t pT/(tTt) YY – t qT/(tTt) Vectors are collected into matrices, e.g., T=(t1, ... , tA) 10 Conjugate vectors 1 w r: t=Xw, p=XTt. paTrb=0 for ab. X t p r X t r: t=Xq, qaTrb=0 for ab. q r w s v r and s: X t p r t=Xw, p=XTv, paTrb=0, taTsb=0 for ab. 11 Conjugate vectors 2 The conjugate vectors R=(r1, r2, ..., rA) satisfy: T=XR. Latent structure solution: X = T PT + X0, where X0 is the part of X that is not used Y = T QT + Y0, where Y0 is the part of Y that could not be explained Y = T QT + Y0= X (R QT) + Y0= X B + Y0, for B= R QT The conjugate vectors are always computed together with the score vectors. When regression on score vectors has been computed, the regression on the original variables is computed as shown. 12 Optimization procedure, 1 w1 One data block: |t1|2 max X1 t1 w1 Two data blocks: X1 t1 |q2|2 max X2 q2 13 Three data blocks w Start X Z Y t tz ty qz qy X basis Y estimated Y basis |qz|2 max Z estimated w X1 X2 t1 X3 q2 Adjustments: t1 describes X1: X1X1-t1p1T/(t1Tt1), p1=X1Tt1. t1 describes X2: X2X2-t1q2T/(t1Tt1), q2=X2Tt1. X4 t3 t4 q4 q2 describes X3: X3X3-t3q2T/(q2Tq2), t3=X3q2. t3 describes X4: X4X4-t3q4T/(t3Tt3), q4=X4Tt3. 14 Optimization procedure, 2 Two input and two output data blocks: w1 X3 X1 w 2 t1 X4 X2 t2 Find w1 and w2: q13 q23 |q13+q23+q14+q24|2 max q14 q24 Two input, one intermediate and one output data blocks: w1 X1 Find w1 and w2: w 2 t1 X2 t2 X3 X4 q13 q23 q134 q234 |q134+q234|2 max 15 Balanced optimization of fit and prediction (H-principle) In linear regression we are looking for a weight vector w, so that the resulting score vector t=Xw is good! X Y Linear regression The basic measure of quality is the prediction variance for a sample, x0. Assuming negligible bias it can be written (assuming standard assumptions) F(w) = Var(y(x0)) = k[1 – (yTt)2/(tTt)][1 + t02/(tTt)]. It can be shown that F(cw)=F(w) for all c>0. Choose c such that (tTt)=1. Then F(w) = k[1 – (yTt)2][1 + t02]. In order to get a prediction variance as small as possible, it is natural to choose w such that (yTt)2 becomes as large as possible, maximize (yTt)2 = maximize |q|2 (PLS regression) 16 Optimization procedure, 3 Weighing along objects (rows) (same algorithm, but using the transposes): v1 X1 p1 Task: find weight vector v1: maximize |t2|2 X2 t2 v1 Task: find weight vector v1: maximize |q3|2 X1 p1 X2 X3 t2 q3 17 Optimization procedure, 4 w1 X1 Task: find weight vector w1: maximize |q3|2, t1 where p1 X2 X3 t2 Regression equations q3 q3=X3Tt2 =X3TX2p1 =X3TX2X1Tt1 =X3TX2X1TX1w1 If p1 is a good weight vector for X2, a good result may be expected. X3,est=X2B23 X2,est=B12X1 X1,est=X1B11 Pre-processing may be needed to find variables in X1 and in X2 that are highly correlated to each other. 18 Three types of reports Reports: Xi How a data block is doing in a network How a data block can be described by data blocks that lead to it. Xi Xi-1 Xi Xi-2 Xi-3 How a data block can be described by one data block that leads to it. Xi-2 Xi 19 Production data, 1 X1 X2 X1: Process parameters, 8 variables X2: NIR data, 1560 variables (reduced to 120) X1 ’disappears’ in the NIR data X2. Y No 1 2 3 4 5 6 7 8 9 10 11 12 |X2|2 78,961 91,538 96,351 97,942 98,620 98,967 99,205 99,294 99,349 99,426 99,606 99,657 |Y|2 51,483 67,559 76,291 81,383 83,900 85,705 87,917 90,472 92,183 92,947 93,084 93,376 |X|2 74,969 86,786 91,627 95,373 95,919 97,054 97,508 97,990 98,667 98,896 99,103 99,202 |Y|2 51,964 69,553 80,643 85,058 89,056 90,050 91,990 93,455 94,020 94,708 95,082 95,740 20 Production data, 2 At each step: w1 X1 t1 w2 Results for X2, process parameters: 5 score vectors explain 11.92% of Y. Y X2 t2 At each step the score vectors are evaluated. Non-significant ones are excluded. Results for X1, NIR data: 12 score vectors explain 84.141% of Y. Total 96.06%=11.920%+84.14% is explained of Y. No Step |Y|2 1 1 4,957 2 2 9,315 3 5 10,393 4 6 10,929 5 8 11,920 No Step |Y|2 1 1 51,483 2 2 69,121 3 3 73,070 4 4 76,506 5 5 78,669 6 6 80,923 7 7 82,129 8 8 82,552 9 9 83,132 10 10 83,590 11 11 83,881 12 12 84,141 21 Production data, 3 R2-values: X2 Plot of estimated versus observed quality variable using only score vectors for process parameters. 96.06% 75.12% Y 0.3 0.2 X1 87.75% 0.1 0 -0.1 The process parameters contribute marginally by 11.92%. But if only they were used, they would explain 75.12% of the variation of Y. -0.2 R2=0.7512 -0.3 -0.4 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 22 Directed network of data blocks Input blocks Intermediate blocks ... Output blocks ... ... Give weight vectors for initial score vectors Are described by previous blocks and give score vectors for succeeding blocks Are described by previous blocks 23 Magnitudes computed between two data blocks Xk Xi Measures of precision Different views: a) As a part of a path b) If the results are viewed marginally c) If only XiXk Measures of fit ... Ti: Score vectors Qi: Loading vectors Bi: Regression coefficients Etc 24 Stages in batch processes Time Batches X1 X2 Xk Stages 1 2 K Paths: Y Final quality X1 X2 ... XK Y Given a sample x10, the path model gives estimated samples for later blocks [X1 X2 X3] X4 Y Given values of (x10 x20 x30), estimates for values of x4 and y are given. [X1 X2 X3] [X4 X5] Y Given values of (x10 x20 x30), estimates for values of (x4 x5) and y are given. 25 Schematic illlustration of the modelling task for sequential processes Stages X1 X2 Now X3 X4 Y Known process Later stages parameters Initial Next stage conditions 26 Plots of score vectors X1 X2 t1 X1 XL t2 X1 – X2 t2 tL X1 – XL tL t1 t1 The plots will show how the changes are relative to the first data block. 27 Graphic software to specify paths X4 X5 X1 X3 X2 ... XL Blocks are dragged into the screen. Relationships specified. 28 Pre-processing of data • Centring. If desired centring of data is carried out • Scaling. In the computations all variables are scaled to unit length (or unit standard deviation if centred). It is checked if scaling disturbs the variable, e.g. if it is constant except for two values, or if the variable is at the noise level. When analysis has been completed, values are scaled back so that units are in original values. • Redundant variable. It is investigated if a variable does not contribute to the explanation of any of the variables that the presnt block lead to. If it is redundant, it iseliminated from analysis. • Redundant data block. It is investigated if a data block can provide with a significant description of the block that it is connected to later in the network. If it can not contribute to the description of the blocks, it is removed from the network. 29 Post-processing of results Score vectors computed in the passages through the network are evaluated in the analysis at one passage. Apart from the input blocks the score vectors found between passages are not independent. The score vectors found in a relationship XiXj are evaluated to see if all are significant or some should be removed for this relationship. Cross-validation like in standard regression methods Confidence intervals for parmeters by resampling technique 30 International workshop on Multi-block and Path Methods 24. – 30. May 2009, Mijas, Malaga, Spain 31