Transcript Document

PCA: Loadings Plot (p1/p3)
34-months of 1 day rev . 2 (incl. c hip data) no. 2.M4 (PC A-X), Bad residuals remov ed
p[1]/ p[ 3]
X
SEASON
52XQI195.AI
53PIC210.PV 52T I031.AI
52T I011.AI
85LCS320.AI
Pex_L1_R14
Pex_L1_Blan
53PIC309.PV
52XIC811.PV
85LCB320.AI
52T IC793.PV
53LIC301.PV
52PI706.AI
52FIC167.PV
Pex_L1_R100
52FIC116.PV
52PIC705.PV
52T IC711.PV
811FI102.AI
Cop>9/8
52PIB143.AI
53PIC308.PV
53FFC455.PV
CopSICC
53NI716.AI
52T IC102.PV
Cop>7/8
52PIB193.AI
53NIC100.PV
Pex_L1_PFM
53AI054.AI
52FIC115.PV
53HIC762.PV
53LIC510.PV
Cop>5/8 52XPI130.AI
Pex_L1_PFL
52T
I118.AI
52SIC110.PV
52JCC139.PV
52JIC139.AI
52SIA110.AI
53LV301.AI
53AI034.AI53LIC011.PV
52T I168.AI
52XIC130.AI
52SQI110.AI
52KQC139.AI
52XAI130.AI
52HIC812.PV
52PIP143.AI
CopECOR
52XIC180.AI
52FIC154.PV
53PIC305.PV
CopCAR
CopECLA
85FQ101.AI
52FIC165.PV
52KQC189.AI
53LR405.AI
52PIC159.PV
52IIC178.PV
52PIC961.PV
52PIC105.PV
52LIC106.PV
52FIC164.PV
52FIC104.PV
52X_130.AI_split_L1.
52FRA703.AI
52ZIC148.PV
52PCA111.PV
52PCA161.PV
52PCB111.PV
52T
IC010.CO
52PI178.AI
52ZI194.AI
52SI055.AI
52JI189.AI
Cop<3/16
Pex_L1_PFC
CopDENS
53FI012.AI
52PI128.AI
52FFC117.PV
52IIC128.PV
52PIA193.AI
52PIA143.AI
53WI012.AI
Cop>3/16
52FIC177.PV
52ZIC198.PV
52PCB161.PV Cop>3/8 33LI214.AI
53NIC013.PV
53AIC453.PV
52ZIC147.PV52ZIC197.PV
Pex_L1_CSF
52FFC166.PV
811FI104.AI
Pex_L1_Cons
52FR960.AI
Pex_L1_P200
52PIP193.AI
Pex_L1_R28
Pex_L1_LMF
52T R964.AI
52ZI144.AI
Pex_L1_R48
Season
0.20
p[3]
0.10
0.00
-0.10
-0.20
-0.20
-0.10
0.00
0.10
p[1]
NAMP Module 17: “Introduction to Multivariate Analysis”
Tier 2, Rev.: 5
Conclusions: p3
34-months of 1 day rev . 2 (incl. c hip data) no. 2.M4 (PC A-X), Bad residuals remov ed
p[1]/ p[ 3]
X
Summer
0.20
p[3]
0.10
0.00
-0.10
-0.20
Winter
SEASON
52XQI195.AI
53PIC210.PV 52T I031.AI
52T I011.AI
85LCS320.AI
Pex_L1_R14
Pex_L1_Blan
53PIC309.PV
52XIC811.PV
85LCB320.AI
52T IC793.PV
53LIC301.PV
52PI706.AI
52FIC167.PV
Pex_L1_R100
52FIC116.PV
52PIC705.PV
52T IC711.PV
811FI102.AI
Cop>9/8
52PIB143.AI
53PIC308.PV
53FFC455.PV
CopSICC
53NI716.AI
52T IC102.PV
Cop>7/8
52PIB193.AI
53NIC100.PV
Pex_L1_PFM
53AI054.AI
52FIC115.PV
53HIC762.PV
53LIC510.PV
Cop>5/8 52XPI130.AI
Pex_L1_PFL
52T
I118.AI
52SIC110.PV
52JCC139.PV
52JIC139.AI
52SIA110.AI
53LV301.AI
53AI034.AI53LIC011.PV
52T I168.AI
52XIC130.AI
52SQI110.AI
52KQC139.AI
52XAI130.AI
52HIC812.PV
52PIP143.AI
CopECOR
52XIC180.AI
52FIC154.PV
53PIC305.PV
CopCAR
CopECLA
85FQ101.AI
52FIC165.PV
52KQC189.AI
53LR405.AI
52PIC159.PV
52IIC178.PV
52PIC961.PV
52PIC105.PV
52LIC106.PV
52FIC164.PV
52FIC104.PV
52X_130.AI_split_L1.
52FRA703.AI
52ZIC148.PV
52PCA111.PV
52PCA161.PV
52PCB111.PV
52T
IC010.CO
52PI178.AI
52ZI194.AI
52SI055.AI
52JI189.AI
Cop<3/16
Pex_L1_PFC
CopDENS
53FI012.AI
52PI128.AI
52FFC117.PV
52IIC128.PV
52PIA193.AI
52PIA143.AI
53WI012.AI
Cop>3/16
52FIC177.PV
52ZIC198.PV
52PCB161.PV Cop>3/8 33LI214.AI
53NIC013.PV
53AIC453.PV
52ZIC147.PV52ZIC197.PV
Pex_L1_CSF
52FFC166.PV
811FI104.AI
Pex_L1_Cons
52FR960.AI
Pex_L1_P200
52PIP193.AI
Pex_L1_R28
Pex_L1_LMF
52T R964.AI
52ZI144.AI
INTERPRETATION
Pex_L1_R48
Component 3:
-0.10
Summer chips vs.
winter chips
-0.20
0.00
0.10
p[1]
NAMP Module 17: “Introduction to Multivariate Analysis”
Tier 2, Rev.: 5
So what have we accomplished?
Using PCA, we have determined that 45% of the variability in the
original 130 variables can be represented by using just 3 new
variables or “components”. These three components are orthogonal,
meaning that the variation within each one occurs independently of
the others. In other words, the new components are uncorrelated with
each other.
Component 3
SUMMER / WINTER
Explains 6%
REFINER THROUGHPUT
Component 1
Explains 32%
Component 2
Explains 7%
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 1
Tier 2, Rev.: 5
What exactly are the new
components?
Each new component is simply a linear combination of the original
variables. For instance in this case component 3 is nothing more and
nothing less than the following equation:
Component 3 =
0.242472 x “SEASON”
+ 0.159948 x “85LCS320.AI”
+ many more positive terms…
– 0.224472 x “52ZI144.AI”
– 0.214372 x “52TR964.AI”
– many more negative terms…
Obviously this equation, when written out fully, has 130 terms, one for
each original variable. Many of these, however, have coefficients
close to zero, meaning that they have little impact on that component.
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 1
Tier 2, Rev.: 5
What about the unexplained
variance?
Our PCA model has captured 45% of the variability in the original
dataset. What about the other 55%?
The unexplained variance has several sources:
• We only retained three components. More variance is captured
by the higher-order components, but much of this is noise and of
no use to us as process engineers.
• In any case, our linear model is a simplification of the original
dataset, and so can never explain 100% of the variance.
• Outliers and other problems with the original data can severely
weaken the model (“Garbage in, garbage out”)
• Some of the variables impacting the process were not measured
(or may even be unmeasurable)
This last point is very important for our example, since many key chip
characteristics including wood species were never measured.
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 1
Tier 2, Rev.: 5
Use of PLS
Now we will have a brief look at the use of PLS, using the same data.
An important pulp characteristic is average fibre length, because
longer fibres make stronger paper. This characteristic is represented
in our data by three variables: “Pex_L1_LMF”, “Pex_L1_R28” and
“Pex_L1_R28”. We will designate these three variables as Y’s.
The rest of the pulp characteristics were excluded from the PLS
analysis.
All the other variables were designated as X’s.
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 1
Tier 2, Rev.: 5
Results for PLS Model
34-m onths of 1 day rev . 2 (incl. c hip data) no. 2.M8 (PLS), Untit led
R 2Y (c um )
Q2(cum )
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
Comp[3]
Comp[2]
0.00
Comp[1]
0.10
Comp No.
This is the R2 and Q2 plot for the PLS model. The R2 values tell us that
the first component explains 23% of the variability in the original Y’s,
the second another 13% and the third another 8%, for a total of 44%.
The Q2 values are only slightly lower, meaning that the model performs
relatively well in predicting new Y values.
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 1
Tier 2, Rev.: 5
PLS: Score/Loadings Plot
When doing PLS, one of the main things we want to know are which
X’s are important to the model. In other words, which X’s are correlated
with our Y’s?
We can determine this by studying score and loadings plots which
show the X’s and Y’s in relation to the new components. However,
these plots can be messy and complicated to read, as shown on the
next page.
Note that the axes are labelled differently for the PLS plots. Instead of
p(1), for instance, the abcissa is designated w*c(1). This refers to the
dual nature of this plot, showing both X and Y space together.
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 1
Tier 2, Rev.: 5
PLS Loadings plot
JInterpretation
un 20 02(1). 10 of
s econds
COMPLETE
WI TH 45 min
LAG.
this messy
and confusing
plot
is M9
not(PLS), Untit led
w*c [1]/ w*c [2]
X
Y
obvious. We therefore turn to other outputs…
52T R964.AI
53HIC762.PV
52PCB111.PV
0.30
52ZI144.AI
0.20
w*c[2]
0.10
0.00
-0.10
53PIC308.PV
-0.20
53PIC305.PV
52FIC154.PV
52FFC166.PV
53LIC301.PV
53LR405.AI
52T I168.AI
52ZIC198.PV
52PCA111.PV
52PCB161.PV
52SIA110.AI
52PI128.AI
52SQI110.AI
52KQC189.AI
52KQC139.AI
Pex_L1_CSF
52XQI195.AI
52T IC711.PV
52PIA193.AI
52XPI130.AI
52X_130.AI_split_L1.
53AIC453.PV
52FIC165.PV
52PIC961.PV
Pex_L1_Cons
52FIC167.PV52FIC164.PV
85LCB320.AI
53PIC309.PV
52FFC117.PV
52PIB143.AI
53WI012.AI
53FI012.AI
52PI178.AI
Pex_L1_Blan
52XIC130.AI
52PIP193.AI
52T
I118.AI
52PIP143.AI
52ZIC197.PV
52T I031.AI
52JCC139.PV
52JIC139.AI
52LIC106.PV
52PIA143.AI
52PCA161.PV
52FR960.AI
52IIC128.PV 52T IC010.CO
52SI055.AI
52SIC110.PV
33LI214.AI
52IIC178.PV
52PIC159.PV
53LIC510.PV
52T IC102.PV
53LV301.AI
52ZIC147.PV
53LIC011.PV
52XAI130.AI
52JI189.AI
52XIC180.AI
52PIC105.PV
52FIC116.PV
52ZIC148.PV
52T IC793.PV 52T I011.AI
52PIB193.AI
52FIC177.PV
52FIC115.PV
52HIC812.PV 52ZI194.AI
-0.30
-0.20
-0.10
0.00
0.10
0.20
0.30
w*c[1]
NAMP Module 17: “Introduction to Multivariate Analysis”
Tier 2, Rev.: 5
PLS: Other plots
We will now look at a number of different plots that can help us interpret
the PLS results.
The first is the “X/Y Overview plot”, which gives R2 and Q2 for each
original X. This tells us how well each original variable was modelled.
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 1
Tier 2, Rev.: 5
0.00
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 1
CONS. PTM VERS MACH.
32-months of 1 day.M2 (PLS), Untitled
VAPEUR RAFF.VERS GEN
PRESS VAP DES CYCLON
PTM VAPEUR GENEREE
Pex_L1_R48
Pex_L1_R28
Pex_L1_R14
Pex_L1_R100
Pex_L1_PFM
Pex_L1_PFL
Pex_L1_PFC
Pex_L1_P200
Pex_L1_LMF
Pex_L1_CSF
Pex_L1_Cons
Pex_L1_Blan
PRESS ACCP EPUR PRIM
CONS REG 1 CUV DET
CONSIS DRAIN GENER.
HYDRO VERS NIVELLEME
EGOUT REJETS RAFFINE
EGOUT ACC.TAMIS PRIM
VAPEUR ENTREE GENERA
X/Y Overview
R2VY[4](cum)
Q2VY[4](cum)
1.00
0.80
0.60
0.40
0.20
Var ID (Var. Sec. ID:1)
Tier 2, Rev.: 5
PLS: Other plots
The next type of plot is the “coefficient plot”, which shows the actual
PLS equation in graphical form. Coefficients for each X are shown as
positive or negative bars.
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 1
Tier 2, Rev.: 5
CoeffCS[1](53NIC013.PV)
-0.020
-0.030
SEASON
NIVEAU SILO A COPEAU
DIL HP CYCL 1 LX10
DIL MP Z CEN R5 LX1
DEBIT VAPEUR AU RAF
DIL MP Z CEN R1 LX1
DEBIT VAPEUR AU RAF
DIL HP Z CON R5 LX1
HYDRO. SULF.RAFF.NO.
DIL HP CYCL 5 LX10
CH DIL CYCL 5 KLX100
NOMBRE RAFF. SELECTI
CHARGE VIS CYCLONE 1
CHARGE VIS CYCLONE 5
CONTR. ENERGIE (éner
CHARGE RAFFINEUR 5 (
CHARGE RAFFINEUR 1 (
NIVEAU PRECHAUFFEUR
PRESS RAFF. NO 1 GEN
PRESS RAFF. NO 5 GEN
PRESS RAFF. NO 1 ATM
PRESS RAFF. NO 5 ATM
PRESS PRECHAUFFEUR
PRESS ALIMENT. RAFF.
PRESS VAPEUR HP PTM
PRODUC LIGNE 1 VIS 0
TOTAL PRODUC. RAFF 1
TEMP EGOUTTEUR NO 1
TEMP EGOUTTEUR NO 2
TEMP RAFFINEUR NO 1
TEMP RAFFINEUR NO 5
TEMP. EAU LAVAGE COP
TEMP RECHAUF EAU BL
ENERGI SPECIF LIGNE
ENERGI SPEC RAF NO
ENERGI SPEC RAF NO
RATIO ENERGIE SPEC.
TOTAL FEEDG. FICTIFS
POSIT PLAQUES V RAF
POSIT PLAQUES H RAF
POSIT PLAQUES V RAF
POSIT PLAQUES H RAF
DEBIT PATE CUV.DET.1
BYPASS ALIM FILT DSM
NIVEAU CU.DET.P531-4
NIVEAU CUV.EAU BLC.B
PRESS EAU DILUTION M
PRESS EAU DILUTION H
PRESS EAU DILUTION
PROD. LIGNE 1 DETENT
NIVEAU RES.NAOH 5 (%
DEBIT VAPEUR HP PTM
RAFF 1 VIE DES PLAQU
RAFF 5 VIE DES PLAQU
PRESS. CYCLONE RAFF.
PRESS CYCLONE RAFF 5
FORCE CHAMBRE A RAF
FORCE CHAMBRE A RAF
FORCE CHAMBRE B RAF
FORCE CHAMBRE B RAF
PRESS CHAMBRE P RAF
PRESS CHAMBRE P RAF
VITESS VIS SORTI PRE
VITESS VIS ALIM RAF
TEMP TREMIE TEM NO 1
TEMP CONDENSATS LAV
TEMP VAPEUR HP PTM
Split Ligne #1
POSIT STATOR RAF 1
POSIT STATOR RAF 5
PH PATE VERS NIVEL
NIVEAU TOTAL HD.1 A
SORTIE VALVE 301
DEBIT PTM M.P. 1 ET
DEBIT CASSE M.P. 1TOTAL PATE THERM.MEC
SORTIE VALVE LV-320B
POINT CONS.NIV.LCB32
PLS coefficients
32-months of 1 day.M2 (PLS), Untitled
CoeffCS[1](53NIC013.PV)
0.050
0.040
0.030
0.020
0.010
0.000
-0.010
Var ID (Var. Sec. ID:1)
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 1
Tier 2, Rev.: 5
PLS: VIP plots
Another very useful output is the ‘Variable Importance Plot’ (VIP) which
ranks the X’s in terms of importance to the model.
Note that, because no designed experiment has taken place, we
cannot infer that these X’s influence the Y’s. MVA on its own does not
prove cause and effect. All we can say is that they are correlated,
meaning that they tend to change at the same time. The real cause
may be external, like a change in raw material quality.
Let’s have a look at the VIP plot.
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 1
Tier 2, Rev.: 5
“Variable Importance Plot”
34-m onths of 1 day rev . 2 (inc l. c hip dat a) no. 2. M8 (PLS), PLS of Y 1's only
VI P[3]
2.50
VIP[3]
2.00
1.50
1.00
0.50
52PCA161.PV
Cop>3/8
52FR960.AI
52TI118.AI
52FFC117.PV
52ZIC197.PV
53FI012.AI
52X_130.AI_split_L1.
52XQI195.AI
52JI189.AI
52FIC164.PV
52PI128.AI
52FIC116.PV
52TI031.AI
52PIC961.PV
811FI102.AI
52TI011.AI
52FIC167.PV
52TR964.AI
52TIC102.PV
52PIP143.AI
52PIA193.AI
52ZIC148.PV
52PIP193.AI
53PIC308.PV
52ZI144.AI
53PIC305.PV
X’s
SEASON
0.00
Var ID (Primary)
These are the X’s that have the
strongest correlation to our Y’s.
NAMP Module 17: “Introduction to Multivariate Analysis”
Y’s
Average fibre
length variables
Example 1
Tier 2, Rev.: 5
The most important X’s
The most important X, according the VIP plot, is “Season”. This means
that the fibre length varies more with the season than with any other X
variable.
The other X’s on the list are mainly refiner operating parameters such
as dilution water flows, hydraulic pressures, and energy inputs. An
expert on refiner operation would find these results interesting, but we
will not examine them in detail here.
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 1
Tier 2, Rev.: 5
The limitations of PLS
PLS results are difficult to interpret. It is always preferable to perform a
PCA on the entire dataset first, to get a feel for the overall trends.
One of the trickiest aspects of PLS is that the first component in the X
space must correspond to the first component in the Y space, the
second with the second, and so forth. Finding a physical interpretation
for each of these can be extremely difficult.
MVA is not
magic
It is critical for the student to understand that only those X’s which were
measured can be included in the PLS model. There is nothing magical
about PCA or PLS. These techniques can only find patterns and
correlations that existed in the original data in the first place.
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 1
Tier 2, Rev.: 5
End of Example 1:
We’re starting to tame the MVA lion!
NAMP Module 17: “Introduction to Multivariate Analysis”
Tier 2, Rev.: 5
2.2: Example (2)
Using Fewer Variables
NAMP Module 17: “Introduction to Multivariate Analysis”
Tier 2, Rev.: 5
Why use fewer variables?
One obvious problem with the previous example is that the plots are
very hard to read, because there are so many variables. We will
therefore look at smaller number of variables from the same dataset.
There is another good reason for doing this. In the previous example,
our first “throughput” component dominated the others, probably
because so many process variables are associated either directly or
indirectly with the overall flowrate through the system.
In other words, there was a great deal of redundancy in our choice of
variables. This was not inherently a bad thing, and we did manage to
learn some useful things about our process, but perhaps by reducing
the number of initial variables we can learn other things as well.
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 2
Tier 2, Rev.: 5
Iterative nature of MVA
At this point, our approach is probably starting to look CONFUSED.
The student may be wondering:
• Do we use all the data, or remove the outliers first?
• Do we do PCA, or PLS?
• Do we use all the variables, or fewer variables?
The answer is that MVA is very iterative, and there is no foolproof
recipe. The results of one step guide the next. Sometimes you have to
try different things to get useful results, bearing in mind what you know
about the process itself and the dataset.
People who are adept at using MVA have a tendency to try all kinds of
things, all kinds of different ways. In fact, just doing a basic PCA is the
easy part. The difficult part is deciding what to try next, because there
are countless possibilities. Knowledge of the process itself is key,
which is why this is a job for chemical engineers and not statisticians.
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 2
Tier 2, Rev.: 5
Which variables to use?
Getting back to the example, we made a ‘short list’ of key variables
based on our knowledge of the process itself. Just because hundreds
of variables are available does not mean that we are obliged to use
them all for each MVA trial.
The variables related mainly to chip quality (density and moisture
content) and to pulp quality (brightness, consistency, …). Also included
were “SEASON”, given its prominence in the previous PCA analysis,
bleach consumption and specific refiner energy.
In all, only 14 variables were used.
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 2
Tier 2, Rev.: 5
PCA on 14 variables
34-months of 1 day rev . 2 (incl. c hip data) no. 2.M25 (PC A-X), PC A of Tom Browne v ariables R 2X(c um)
Q2(cum)
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
Comp[2]
0.00
Comp[1]
0.10
Comp No.
This is the R2 and Q2 plot for the 14 variables. The MVA software only
found 2 components, which is not uncommon when there are so few
initial variables. The first component explains 28% of the variability in
the original data, the second another 16%, for a total of 44%.
The Q2 values are much lower, with a cumulative of barely 24%. This
means that the predictive value of the model is much lower than
before. This is hardly surprising, since the inherent information
contained within the 116 excluded variables is now missing.
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 2
Tier 2, Rev.: 5
Score Plot for 14 variables
The score plot for the 14 variables is shown on the next page. It is
impossible to create a 3-D score plot in this case, since there are only
2 components.
Autumn: Sep 1 – Nov 30
Winter: Dec 1 – Feb 28
Spring: Mar 1 – May 31
Summer: Jun 1 – Aug 31
The vast majority of the days fall on or near the first component. Is is
plainly obvious from this graph that the first component is related to
individual seasons, with clear segregation between the three years.
Note how this first component resembles the second component from
example 1 (more on this later…)
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 2
Tier 2, Rev.: 5
Score Plot
34-months of 1 day rev. 2 (incl. chip data) no. 2.M25 (PCA-X), PCA of Tom Browne variables
t[1]/t[2]
2nd component
Colored according to classes in M25
No Class
Class 1
Class 2
Class 3
Class 4
strongly influenced
by these points
Autumn: Sep 1 – Nov 30
Winter: Dec 1 – Feb 28
14
Spring:
Mar 1 – May 31
12
Summer:
Jun 1 – Aug 31
Jun 25 –
Jul 1, 01
10
t[2]
8
Aug 8 –
12, 01
6
4
2
2002
0
2000
-2
2001
2001/2002
2000
-4
-5
-4
-3
-2
-1
0
1
2
3
4
5
t[1]
WINTER
INTERPRETATION
Component 1: Individual seasons
NAMP Module 17: “Introduction to Multivariate Analysis”
SUMMER
Tier 2, Rev.: 5
Second component
The second component is largely influenced by the observations in the
upper-right quadrant (remember, it is the observations that influence
the components, not the other way around). Looking back at the
original data, we saw that these observations fell within certain specific
periods in June and August 2001.
What differentiated these periods from the rest of our three-year
timeframe?
Trying to figure this out by looking at the original data would be very
tedious, if not impossible. We therefore make use of the ‘Contribution
plot’ for one of the dates of interest.
The contribution plot shows the values of the original variables for that
observation point (June 29, 2001) relative to the average of all the
observations taken together. It gives us a quick, visual answer to
“What’s different about this observation?”
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 2
Tier 2, Rev.: 5
Contribution Plot:
June 29, 2001
34-months of 1 day rev. 2 (incl. chip data) no. 2.M25 (PCA-X), PCA of Tom Browne variables
Score Contrib(Obs 584 - Average), Weight=p1p2
Score Contrib(Obs 584 - Average), Weight=p1p2
12
More fines
than average
10
8
6
4
2
0
-2
-4
Fewer long
fibres than
average
-6
-8
CopSICC
CopDENS
Pex_L1_R48
Pex_L1_R28
Pex_L1_R14
Pex_L1_R100
Pex_L1_P200
Pex_L1_LMF
Pex_L1_CSF
Pex_L1_Cons
Pex_L1_Blan
52XAI130.AI
52FIC165.PV
SEASON
-10
Var ID (Primary)
NAMP Module 17: “Introduction to Multivariate Analysis”
Tier 2, Rev.: 5
Contribution plot results
The bars on the contribution plot graph tell an important story: during
the period of interest, the refiners generated more fines than usual,
and fewer long fibres. It appears that the refiners were chopping up
the fibres, eliminating the longest size fractions while generating fine
fragments. This is not a desirable process performance, and therefore
a significant finding.
A study of the loadings plot confirms that the second component is
definitely related to fibre length (variables in red ovals). Note that a
variable does not have to lie directly upon a component to influence it;
in this case, very few of the variables are close to the component line,
yet clearly they are affecting it. Their distance from the axis merely
means that they are also related to the first component. Note that
specific energy is also related to the second component (green oval).
This is highly significant, since it is this energy that chops the fibres!
Bleach consumption, pulp brightness and season are related to the
first component (blue ovals). Again, this is similar to example 2.
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 2
Tier 2, Rev.: 5
Loadings Plot
INTERPRETATION
34-months of
day rev. 2 (incl. chip data) no. 2.M25 (PCA-X), PCA of Tom Browne variables
Component
2:1 Fibre
p[1]/p[2]
length
0.50
-
X
Pex_L1_P200
0.40
0.30
Pex_L1_R100
ENERGI SPECIF LIGNE 1
0.20
p[2]
0.10
0.00
Pex_L1_Cons
Pex_L1_CSF
Copeaux DENSITE
HYDRO. SULF.RAFF.NO.5
Pex_L1_R48
SEASON
Copeaux SICCITE
_Blan
Pex_L1_Blan
-0.10
-0.20
-0.30
+
Pex_L1_R28
-0.40
Pex_L1_R14
-0.50
Pex_L1_LMF
-0.40
-0.30
-0.20
-0.10
0.00
0.10
0.20
0.30
p[1]
NAMP Module 17: “Introduction to Multivariate Analysis”
Tier 2, Rev.: 5
Same two components?
The most striking difference between the example 1 results and the
example 2 results is that the “throughput” component has
disappeared. This is because we have removed all the variables that
relate to this process parameter.
This leaves us to wonder if the two components we found in example
2 are just the second and third components from example 1. In other
words, now that we’ve eliminated throughput, the next most significant
component has been “promoted” to become the first component, and
the third to second. Because all components are statistically
independent, this is plausible.
1
X
2
3
The physical interpretations of these components seem to be
compatible, so this shift is entirely possible. If so, a comparison of
examples 1 and 2 could give us further insights into the process.
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 2
Tier 2, Rev.: 5
Was it worth trying fewer
variables?
Absolutely!
We were able to generate cleaner, easier to interpret graphs, while
focussing on the variables we were the most interested in.
Once again, we saw the importance of Season, lending credence to
our physical interpretations in Example 1. Other similarities with the
example 1 results, particularly the two components themselves, could
yield further insights about what is actually going on in the process.
However, the Q2 for this bare-bones case is quite low, meaning this
model has poor predictive value. Also, a great many important
variables were left out completely, so this is not a full picture, but
rather an additional view of our original dataset.
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 2
Tier 2, Rev.: 5
End of Example 2:
Getting smarter…
NAMP Module 17: “Introduction to Multivariate Analysis”
Tier 2, Rev.: 5