Transcript Document
PCA: Loadings Plot (p1/p3) 34-months of 1 day rev . 2 (incl. c hip data) no. 2.M4 (PC A-X), Bad residuals remov ed p[1]/ p[ 3] X SEASON 52XQI195.AI 53PIC210.PV 52T I031.AI 52T I011.AI 85LCS320.AI Pex_L1_R14 Pex_L1_Blan 53PIC309.PV 52XIC811.PV 85LCB320.AI 52T IC793.PV 53LIC301.PV 52PI706.AI 52FIC167.PV Pex_L1_R100 52FIC116.PV 52PIC705.PV 52T IC711.PV 811FI102.AI Cop>9/8 52PIB143.AI 53PIC308.PV 53FFC455.PV CopSICC 53NI716.AI 52T IC102.PV Cop>7/8 52PIB193.AI 53NIC100.PV Pex_L1_PFM 53AI054.AI 52FIC115.PV 53HIC762.PV 53LIC510.PV Cop>5/8 52XPI130.AI Pex_L1_PFL 52T I118.AI 52SIC110.PV 52JCC139.PV 52JIC139.AI 52SIA110.AI 53LV301.AI 53AI034.AI53LIC011.PV 52T I168.AI 52XIC130.AI 52SQI110.AI 52KQC139.AI 52XAI130.AI 52HIC812.PV 52PIP143.AI CopECOR 52XIC180.AI 52FIC154.PV 53PIC305.PV CopCAR CopECLA 85FQ101.AI 52FIC165.PV 52KQC189.AI 53LR405.AI 52PIC159.PV 52IIC178.PV 52PIC961.PV 52PIC105.PV 52LIC106.PV 52FIC164.PV 52FIC104.PV 52X_130.AI_split_L1. 52FRA703.AI 52ZIC148.PV 52PCA111.PV 52PCA161.PV 52PCB111.PV 52T IC010.CO 52PI178.AI 52ZI194.AI 52SI055.AI 52JI189.AI Cop<3/16 Pex_L1_PFC CopDENS 53FI012.AI 52PI128.AI 52FFC117.PV 52IIC128.PV 52PIA193.AI 52PIA143.AI 53WI012.AI Cop>3/16 52FIC177.PV 52ZIC198.PV 52PCB161.PV Cop>3/8 33LI214.AI 53NIC013.PV 53AIC453.PV 52ZIC147.PV52ZIC197.PV Pex_L1_CSF 52FFC166.PV 811FI104.AI Pex_L1_Cons 52FR960.AI Pex_L1_P200 52PIP193.AI Pex_L1_R28 Pex_L1_LMF 52T R964.AI 52ZI144.AI Pex_L1_R48 Season 0.20 p[3] 0.10 0.00 -0.10 -0.20 -0.20 -0.10 0.00 0.10 p[1] NAMP Module 17: “Introduction to Multivariate Analysis” Tier 2, Rev.: 5 Conclusions: p3 34-months of 1 day rev . 2 (incl. c hip data) no. 2.M4 (PC A-X), Bad residuals remov ed p[1]/ p[ 3] X Summer 0.20 p[3] 0.10 0.00 -0.10 -0.20 Winter SEASON 52XQI195.AI 53PIC210.PV 52T I031.AI 52T I011.AI 85LCS320.AI Pex_L1_R14 Pex_L1_Blan 53PIC309.PV 52XIC811.PV 85LCB320.AI 52T IC793.PV 53LIC301.PV 52PI706.AI 52FIC167.PV Pex_L1_R100 52FIC116.PV 52PIC705.PV 52T IC711.PV 811FI102.AI Cop>9/8 52PIB143.AI 53PIC308.PV 53FFC455.PV CopSICC 53NI716.AI 52T IC102.PV Cop>7/8 52PIB193.AI 53NIC100.PV Pex_L1_PFM 53AI054.AI 52FIC115.PV 53HIC762.PV 53LIC510.PV Cop>5/8 52XPI130.AI Pex_L1_PFL 52T I118.AI 52SIC110.PV 52JCC139.PV 52JIC139.AI 52SIA110.AI 53LV301.AI 53AI034.AI53LIC011.PV 52T I168.AI 52XIC130.AI 52SQI110.AI 52KQC139.AI 52XAI130.AI 52HIC812.PV 52PIP143.AI CopECOR 52XIC180.AI 52FIC154.PV 53PIC305.PV CopCAR CopECLA 85FQ101.AI 52FIC165.PV 52KQC189.AI 53LR405.AI 52PIC159.PV 52IIC178.PV 52PIC961.PV 52PIC105.PV 52LIC106.PV 52FIC164.PV 52FIC104.PV 52X_130.AI_split_L1. 52FRA703.AI 52ZIC148.PV 52PCA111.PV 52PCA161.PV 52PCB111.PV 52T IC010.CO 52PI178.AI 52ZI194.AI 52SI055.AI 52JI189.AI Cop<3/16 Pex_L1_PFC CopDENS 53FI012.AI 52PI128.AI 52FFC117.PV 52IIC128.PV 52PIA193.AI 52PIA143.AI 53WI012.AI Cop>3/16 52FIC177.PV 52ZIC198.PV 52PCB161.PV Cop>3/8 33LI214.AI 53NIC013.PV 53AIC453.PV 52ZIC147.PV52ZIC197.PV Pex_L1_CSF 52FFC166.PV 811FI104.AI Pex_L1_Cons 52FR960.AI Pex_L1_P200 52PIP193.AI Pex_L1_R28 Pex_L1_LMF 52T R964.AI 52ZI144.AI INTERPRETATION Pex_L1_R48 Component 3: -0.10 Summer chips vs. winter chips -0.20 0.00 0.10 p[1] NAMP Module 17: “Introduction to Multivariate Analysis” Tier 2, Rev.: 5 So what have we accomplished? Using PCA, we have determined that 45% of the variability in the original 130 variables can be represented by using just 3 new variables or “components”. These three components are orthogonal, meaning that the variation within each one occurs independently of the others. In other words, the new components are uncorrelated with each other. Component 3 SUMMER / WINTER Explains 6% REFINER THROUGHPUT Component 1 Explains 32% Component 2 Explains 7% NAMP Module 17: “Introduction to Multivariate Analysis” Example 1 Tier 2, Rev.: 5 What exactly are the new components? Each new component is simply a linear combination of the original variables. For instance in this case component 3 is nothing more and nothing less than the following equation: Component 3 = 0.242472 x “SEASON” + 0.159948 x “85LCS320.AI” + many more positive terms… – 0.224472 x “52ZI144.AI” – 0.214372 x “52TR964.AI” – many more negative terms… Obviously this equation, when written out fully, has 130 terms, one for each original variable. Many of these, however, have coefficients close to zero, meaning that they have little impact on that component. NAMP Module 17: “Introduction to Multivariate Analysis” Example 1 Tier 2, Rev.: 5 What about the unexplained variance? Our PCA model has captured 45% of the variability in the original dataset. What about the other 55%? The unexplained variance has several sources: • We only retained three components. More variance is captured by the higher-order components, but much of this is noise and of no use to us as process engineers. • In any case, our linear model is a simplification of the original dataset, and so can never explain 100% of the variance. • Outliers and other problems with the original data can severely weaken the model (“Garbage in, garbage out”) • Some of the variables impacting the process were not measured (or may even be unmeasurable) This last point is very important for our example, since many key chip characteristics including wood species were never measured. NAMP Module 17: “Introduction to Multivariate Analysis” Example 1 Tier 2, Rev.: 5 Use of PLS Now we will have a brief look at the use of PLS, using the same data. An important pulp characteristic is average fibre length, because longer fibres make stronger paper. This characteristic is represented in our data by three variables: “Pex_L1_LMF”, “Pex_L1_R28” and “Pex_L1_R28”. We will designate these three variables as Y’s. The rest of the pulp characteristics were excluded from the PLS analysis. All the other variables were designated as X’s. NAMP Module 17: “Introduction to Multivariate Analysis” Example 1 Tier 2, Rev.: 5 Results for PLS Model 34-m onths of 1 day rev . 2 (incl. c hip data) no. 2.M8 (PLS), Untit led R 2Y (c um ) Q2(cum ) 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 Comp[3] Comp[2] 0.00 Comp[1] 0.10 Comp No. This is the R2 and Q2 plot for the PLS model. The R2 values tell us that the first component explains 23% of the variability in the original Y’s, the second another 13% and the third another 8%, for a total of 44%. The Q2 values are only slightly lower, meaning that the model performs relatively well in predicting new Y values. NAMP Module 17: “Introduction to Multivariate Analysis” Example 1 Tier 2, Rev.: 5 PLS: Score/Loadings Plot When doing PLS, one of the main things we want to know are which X’s are important to the model. In other words, which X’s are correlated with our Y’s? We can determine this by studying score and loadings plots which show the X’s and Y’s in relation to the new components. However, these plots can be messy and complicated to read, as shown on the next page. Note that the axes are labelled differently for the PLS plots. Instead of p(1), for instance, the abcissa is designated w*c(1). This refers to the dual nature of this plot, showing both X and Y space together. NAMP Module 17: “Introduction to Multivariate Analysis” Example 1 Tier 2, Rev.: 5 PLS Loadings plot JInterpretation un 20 02(1). 10 of s econds COMPLETE WI TH 45 min LAG. this messy and confusing plot is M9 not(PLS), Untit led w*c [1]/ w*c [2] X Y obvious. We therefore turn to other outputs… 52T R964.AI 53HIC762.PV 52PCB111.PV 0.30 52ZI144.AI 0.20 w*c[2] 0.10 0.00 -0.10 53PIC308.PV -0.20 53PIC305.PV 52FIC154.PV 52FFC166.PV 53LIC301.PV 53LR405.AI 52T I168.AI 52ZIC198.PV 52PCA111.PV 52PCB161.PV 52SIA110.AI 52PI128.AI 52SQI110.AI 52KQC189.AI 52KQC139.AI Pex_L1_CSF 52XQI195.AI 52T IC711.PV 52PIA193.AI 52XPI130.AI 52X_130.AI_split_L1. 53AIC453.PV 52FIC165.PV 52PIC961.PV Pex_L1_Cons 52FIC167.PV52FIC164.PV 85LCB320.AI 53PIC309.PV 52FFC117.PV 52PIB143.AI 53WI012.AI 53FI012.AI 52PI178.AI Pex_L1_Blan 52XIC130.AI 52PIP193.AI 52T I118.AI 52PIP143.AI 52ZIC197.PV 52T I031.AI 52JCC139.PV 52JIC139.AI 52LIC106.PV 52PIA143.AI 52PCA161.PV 52FR960.AI 52IIC128.PV 52T IC010.CO 52SI055.AI 52SIC110.PV 33LI214.AI 52IIC178.PV 52PIC159.PV 53LIC510.PV 52T IC102.PV 53LV301.AI 52ZIC147.PV 53LIC011.PV 52XAI130.AI 52JI189.AI 52XIC180.AI 52PIC105.PV 52FIC116.PV 52ZIC148.PV 52T IC793.PV 52T I011.AI 52PIB193.AI 52FIC177.PV 52FIC115.PV 52HIC812.PV 52ZI194.AI -0.30 -0.20 -0.10 0.00 0.10 0.20 0.30 w*c[1] NAMP Module 17: “Introduction to Multivariate Analysis” Tier 2, Rev.: 5 PLS: Other plots We will now look at a number of different plots that can help us interpret the PLS results. The first is the “X/Y Overview plot”, which gives R2 and Q2 for each original X. This tells us how well each original variable was modelled. NAMP Module 17: “Introduction to Multivariate Analysis” Example 1 Tier 2, Rev.: 5 0.00 NAMP Module 17: “Introduction to Multivariate Analysis” Example 1 CONS. PTM VERS MACH. 32-months of 1 day.M2 (PLS), Untitled VAPEUR RAFF.VERS GEN PRESS VAP DES CYCLON PTM VAPEUR GENEREE Pex_L1_R48 Pex_L1_R28 Pex_L1_R14 Pex_L1_R100 Pex_L1_PFM Pex_L1_PFL Pex_L1_PFC Pex_L1_P200 Pex_L1_LMF Pex_L1_CSF Pex_L1_Cons Pex_L1_Blan PRESS ACCP EPUR PRIM CONS REG 1 CUV DET CONSIS DRAIN GENER. HYDRO VERS NIVELLEME EGOUT REJETS RAFFINE EGOUT ACC.TAMIS PRIM VAPEUR ENTREE GENERA X/Y Overview R2VY[4](cum) Q2VY[4](cum) 1.00 0.80 0.60 0.40 0.20 Var ID (Var. Sec. ID:1) Tier 2, Rev.: 5 PLS: Other plots The next type of plot is the “coefficient plot”, which shows the actual PLS equation in graphical form. Coefficients for each X are shown as positive or negative bars. NAMP Module 17: “Introduction to Multivariate Analysis” Example 1 Tier 2, Rev.: 5 CoeffCS[1](53NIC013.PV) -0.020 -0.030 SEASON NIVEAU SILO A COPEAU DIL HP CYCL 1 LX10 DIL MP Z CEN R5 LX1 DEBIT VAPEUR AU RAF DIL MP Z CEN R1 LX1 DEBIT VAPEUR AU RAF DIL HP Z CON R5 LX1 HYDRO. SULF.RAFF.NO. DIL HP CYCL 5 LX10 CH DIL CYCL 5 KLX100 NOMBRE RAFF. SELECTI CHARGE VIS CYCLONE 1 CHARGE VIS CYCLONE 5 CONTR. ENERGIE (éner CHARGE RAFFINEUR 5 ( CHARGE RAFFINEUR 1 ( NIVEAU PRECHAUFFEUR PRESS RAFF. NO 1 GEN PRESS RAFF. NO 5 GEN PRESS RAFF. NO 1 ATM PRESS RAFF. NO 5 ATM PRESS PRECHAUFFEUR PRESS ALIMENT. RAFF. PRESS VAPEUR HP PTM PRODUC LIGNE 1 VIS 0 TOTAL PRODUC. RAFF 1 TEMP EGOUTTEUR NO 1 TEMP EGOUTTEUR NO 2 TEMP RAFFINEUR NO 1 TEMP RAFFINEUR NO 5 TEMP. EAU LAVAGE COP TEMP RECHAUF EAU BL ENERGI SPECIF LIGNE ENERGI SPEC RAF NO ENERGI SPEC RAF NO RATIO ENERGIE SPEC. TOTAL FEEDG. FICTIFS POSIT PLAQUES V RAF POSIT PLAQUES H RAF POSIT PLAQUES V RAF POSIT PLAQUES H RAF DEBIT PATE CUV.DET.1 BYPASS ALIM FILT DSM NIVEAU CU.DET.P531-4 NIVEAU CUV.EAU BLC.B PRESS EAU DILUTION M PRESS EAU DILUTION H PRESS EAU DILUTION PROD. LIGNE 1 DETENT NIVEAU RES.NAOH 5 (% DEBIT VAPEUR HP PTM RAFF 1 VIE DES PLAQU RAFF 5 VIE DES PLAQU PRESS. CYCLONE RAFF. PRESS CYCLONE RAFF 5 FORCE CHAMBRE A RAF FORCE CHAMBRE A RAF FORCE CHAMBRE B RAF FORCE CHAMBRE B RAF PRESS CHAMBRE P RAF PRESS CHAMBRE P RAF VITESS VIS SORTI PRE VITESS VIS ALIM RAF TEMP TREMIE TEM NO 1 TEMP CONDENSATS LAV TEMP VAPEUR HP PTM Split Ligne #1 POSIT STATOR RAF 1 POSIT STATOR RAF 5 PH PATE VERS NIVEL NIVEAU TOTAL HD.1 A SORTIE VALVE 301 DEBIT PTM M.P. 1 ET DEBIT CASSE M.P. 1TOTAL PATE THERM.MEC SORTIE VALVE LV-320B POINT CONS.NIV.LCB32 PLS coefficients 32-months of 1 day.M2 (PLS), Untitled CoeffCS[1](53NIC013.PV) 0.050 0.040 0.030 0.020 0.010 0.000 -0.010 Var ID (Var. Sec. ID:1) NAMP Module 17: “Introduction to Multivariate Analysis” Example 1 Tier 2, Rev.: 5 PLS: VIP plots Another very useful output is the ‘Variable Importance Plot’ (VIP) which ranks the X’s in terms of importance to the model. Note that, because no designed experiment has taken place, we cannot infer that these X’s influence the Y’s. MVA on its own does not prove cause and effect. All we can say is that they are correlated, meaning that they tend to change at the same time. The real cause may be external, like a change in raw material quality. Let’s have a look at the VIP plot. NAMP Module 17: “Introduction to Multivariate Analysis” Example 1 Tier 2, Rev.: 5 “Variable Importance Plot” 34-m onths of 1 day rev . 2 (inc l. c hip dat a) no. 2. M8 (PLS), PLS of Y 1's only VI P[3] 2.50 VIP[3] 2.00 1.50 1.00 0.50 52PCA161.PV Cop>3/8 52FR960.AI 52TI118.AI 52FFC117.PV 52ZIC197.PV 53FI012.AI 52X_130.AI_split_L1. 52XQI195.AI 52JI189.AI 52FIC164.PV 52PI128.AI 52FIC116.PV 52TI031.AI 52PIC961.PV 811FI102.AI 52TI011.AI 52FIC167.PV 52TR964.AI 52TIC102.PV 52PIP143.AI 52PIA193.AI 52ZIC148.PV 52PIP193.AI 53PIC308.PV 52ZI144.AI 53PIC305.PV X’s SEASON 0.00 Var ID (Primary) These are the X’s that have the strongest correlation to our Y’s. NAMP Module 17: “Introduction to Multivariate Analysis” Y’s Average fibre length variables Example 1 Tier 2, Rev.: 5 The most important X’s The most important X, according the VIP plot, is “Season”. This means that the fibre length varies more with the season than with any other X variable. The other X’s on the list are mainly refiner operating parameters such as dilution water flows, hydraulic pressures, and energy inputs. An expert on refiner operation would find these results interesting, but we will not examine them in detail here. NAMP Module 17: “Introduction to Multivariate Analysis” Example 1 Tier 2, Rev.: 5 The limitations of PLS PLS results are difficult to interpret. It is always preferable to perform a PCA on the entire dataset first, to get a feel for the overall trends. One of the trickiest aspects of PLS is that the first component in the X space must correspond to the first component in the Y space, the second with the second, and so forth. Finding a physical interpretation for each of these can be extremely difficult. MVA is not magic It is critical for the student to understand that only those X’s which were measured can be included in the PLS model. There is nothing magical about PCA or PLS. These techniques can only find patterns and correlations that existed in the original data in the first place. NAMP Module 17: “Introduction to Multivariate Analysis” Example 1 Tier 2, Rev.: 5 End of Example 1: We’re starting to tame the MVA lion! NAMP Module 17: “Introduction to Multivariate Analysis” Tier 2, Rev.: 5 2.2: Example (2) Using Fewer Variables NAMP Module 17: “Introduction to Multivariate Analysis” Tier 2, Rev.: 5 Why use fewer variables? One obvious problem with the previous example is that the plots are very hard to read, because there are so many variables. We will therefore look at smaller number of variables from the same dataset. There is another good reason for doing this. In the previous example, our first “throughput” component dominated the others, probably because so many process variables are associated either directly or indirectly with the overall flowrate through the system. In other words, there was a great deal of redundancy in our choice of variables. This was not inherently a bad thing, and we did manage to learn some useful things about our process, but perhaps by reducing the number of initial variables we can learn other things as well. NAMP Module 17: “Introduction to Multivariate Analysis” Example 2 Tier 2, Rev.: 5 Iterative nature of MVA At this point, our approach is probably starting to look CONFUSED. The student may be wondering: • Do we use all the data, or remove the outliers first? • Do we do PCA, or PLS? • Do we use all the variables, or fewer variables? The answer is that MVA is very iterative, and there is no foolproof recipe. The results of one step guide the next. Sometimes you have to try different things to get useful results, bearing in mind what you know about the process itself and the dataset. People who are adept at using MVA have a tendency to try all kinds of things, all kinds of different ways. In fact, just doing a basic PCA is the easy part. The difficult part is deciding what to try next, because there are countless possibilities. Knowledge of the process itself is key, which is why this is a job for chemical engineers and not statisticians. NAMP Module 17: “Introduction to Multivariate Analysis” Example 2 Tier 2, Rev.: 5 Which variables to use? Getting back to the example, we made a ‘short list’ of key variables based on our knowledge of the process itself. Just because hundreds of variables are available does not mean that we are obliged to use them all for each MVA trial. The variables related mainly to chip quality (density and moisture content) and to pulp quality (brightness, consistency, …). Also included were “SEASON”, given its prominence in the previous PCA analysis, bleach consumption and specific refiner energy. In all, only 14 variables were used. NAMP Module 17: “Introduction to Multivariate Analysis” Example 2 Tier 2, Rev.: 5 PCA on 14 variables 34-months of 1 day rev . 2 (incl. c hip data) no. 2.M25 (PC A-X), PC A of Tom Browne v ariables R 2X(c um) Q2(cum) 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 Comp[2] 0.00 Comp[1] 0.10 Comp No. This is the R2 and Q2 plot for the 14 variables. The MVA software only found 2 components, which is not uncommon when there are so few initial variables. The first component explains 28% of the variability in the original data, the second another 16%, for a total of 44%. The Q2 values are much lower, with a cumulative of barely 24%. This means that the predictive value of the model is much lower than before. This is hardly surprising, since the inherent information contained within the 116 excluded variables is now missing. NAMP Module 17: “Introduction to Multivariate Analysis” Example 2 Tier 2, Rev.: 5 Score Plot for 14 variables The score plot for the 14 variables is shown on the next page. It is impossible to create a 3-D score plot in this case, since there are only 2 components. Autumn: Sep 1 – Nov 30 Winter: Dec 1 – Feb 28 Spring: Mar 1 – May 31 Summer: Jun 1 – Aug 31 The vast majority of the days fall on or near the first component. Is is plainly obvious from this graph that the first component is related to individual seasons, with clear segregation between the three years. Note how this first component resembles the second component from example 1 (more on this later…) NAMP Module 17: “Introduction to Multivariate Analysis” Example 2 Tier 2, Rev.: 5 Score Plot 34-months of 1 day rev. 2 (incl. chip data) no. 2.M25 (PCA-X), PCA of Tom Browne variables t[1]/t[2] 2nd component Colored according to classes in M25 No Class Class 1 Class 2 Class 3 Class 4 strongly influenced by these points Autumn: Sep 1 – Nov 30 Winter: Dec 1 – Feb 28 14 Spring: Mar 1 – May 31 12 Summer: Jun 1 – Aug 31 Jun 25 – Jul 1, 01 10 t[2] 8 Aug 8 – 12, 01 6 4 2 2002 0 2000 -2 2001 2001/2002 2000 -4 -5 -4 -3 -2 -1 0 1 2 3 4 5 t[1] WINTER INTERPRETATION Component 1: Individual seasons NAMP Module 17: “Introduction to Multivariate Analysis” SUMMER Tier 2, Rev.: 5 Second component The second component is largely influenced by the observations in the upper-right quadrant (remember, it is the observations that influence the components, not the other way around). Looking back at the original data, we saw that these observations fell within certain specific periods in June and August 2001. What differentiated these periods from the rest of our three-year timeframe? Trying to figure this out by looking at the original data would be very tedious, if not impossible. We therefore make use of the ‘Contribution plot’ for one of the dates of interest. The contribution plot shows the values of the original variables for that observation point (June 29, 2001) relative to the average of all the observations taken together. It gives us a quick, visual answer to “What’s different about this observation?” NAMP Module 17: “Introduction to Multivariate Analysis” Example 2 Tier 2, Rev.: 5 Contribution Plot: June 29, 2001 34-months of 1 day rev. 2 (incl. chip data) no. 2.M25 (PCA-X), PCA of Tom Browne variables Score Contrib(Obs 584 - Average), Weight=p1p2 Score Contrib(Obs 584 - Average), Weight=p1p2 12 More fines than average 10 8 6 4 2 0 -2 -4 Fewer long fibres than average -6 -8 CopSICC CopDENS Pex_L1_R48 Pex_L1_R28 Pex_L1_R14 Pex_L1_R100 Pex_L1_P200 Pex_L1_LMF Pex_L1_CSF Pex_L1_Cons Pex_L1_Blan 52XAI130.AI 52FIC165.PV SEASON -10 Var ID (Primary) NAMP Module 17: “Introduction to Multivariate Analysis” Tier 2, Rev.: 5 Contribution plot results The bars on the contribution plot graph tell an important story: during the period of interest, the refiners generated more fines than usual, and fewer long fibres. It appears that the refiners were chopping up the fibres, eliminating the longest size fractions while generating fine fragments. This is not a desirable process performance, and therefore a significant finding. A study of the loadings plot confirms that the second component is definitely related to fibre length (variables in red ovals). Note that a variable does not have to lie directly upon a component to influence it; in this case, very few of the variables are close to the component line, yet clearly they are affecting it. Their distance from the axis merely means that they are also related to the first component. Note that specific energy is also related to the second component (green oval). This is highly significant, since it is this energy that chops the fibres! Bleach consumption, pulp brightness and season are related to the first component (blue ovals). Again, this is similar to example 2. NAMP Module 17: “Introduction to Multivariate Analysis” Example 2 Tier 2, Rev.: 5 Loadings Plot INTERPRETATION 34-months of day rev. 2 (incl. chip data) no. 2.M25 (PCA-X), PCA of Tom Browne variables Component 2:1 Fibre p[1]/p[2] length 0.50 - X Pex_L1_P200 0.40 0.30 Pex_L1_R100 ENERGI SPECIF LIGNE 1 0.20 p[2] 0.10 0.00 Pex_L1_Cons Pex_L1_CSF Copeaux DENSITE HYDRO. SULF.RAFF.NO.5 Pex_L1_R48 SEASON Copeaux SICCITE _Blan Pex_L1_Blan -0.10 -0.20 -0.30 + Pex_L1_R28 -0.40 Pex_L1_R14 -0.50 Pex_L1_LMF -0.40 -0.30 -0.20 -0.10 0.00 0.10 0.20 0.30 p[1] NAMP Module 17: “Introduction to Multivariate Analysis” Tier 2, Rev.: 5 Same two components? The most striking difference between the example 1 results and the example 2 results is that the “throughput” component has disappeared. This is because we have removed all the variables that relate to this process parameter. This leaves us to wonder if the two components we found in example 2 are just the second and third components from example 1. In other words, now that we’ve eliminated throughput, the next most significant component has been “promoted” to become the first component, and the third to second. Because all components are statistically independent, this is plausible. 1 X 2 3 The physical interpretations of these components seem to be compatible, so this shift is entirely possible. If so, a comparison of examples 1 and 2 could give us further insights into the process. NAMP Module 17: “Introduction to Multivariate Analysis” Example 2 Tier 2, Rev.: 5 Was it worth trying fewer variables? Absolutely! We were able to generate cleaner, easier to interpret graphs, while focussing on the variables we were the most interested in. Once again, we saw the importance of Season, lending credence to our physical interpretations in Example 1. Other similarities with the example 1 results, particularly the two components themselves, could yield further insights about what is actually going on in the process. However, the Q2 for this bare-bones case is quite low, meaning this model has poor predictive value. Also, a great many important variables were left out completely, so this is not a full picture, but rather an additional view of our original dataset. NAMP Module 17: “Introduction to Multivariate Analysis” Example 2 Tier 2, Rev.: 5 End of Example 2: Getting smarter… NAMP Module 17: “Introduction to Multivariate Analysis” Tier 2, Rev.: 5