Transcript Document

NC STATE
UNIVERSITY
Program for North American Mobility
in Higher Education
Introducing Process Integration for Environmental
Control in Engineering Curricula
MODULE 17: “Introduction to
Multivariate Analysis”
Created at:
Ecole Polytechnique de Montreal &
North Carolina State University, 2003.
NAMP Module 17: “Introduction to Multivariate Analysis”
Tier 3, Rev.: 4
2.4: Example (3)
Shorter Timescales
NAMP Module 17: “Introduction to Multivariate Analysis”
Tier 3, Rev.: 4
Shorter timescales
The previous two examples used daily averages for the 130
process variables. However, we could just as easily have chosen
weekly averages, monthly averages, or several other options.
We could also have chosen shorter timescales, such as 8-hour
averages or 30-minute averages. Obviously, at some point the
number of observations will become unmanageable. For
instance, a spreadsheet with 3 years’ worth of 1-minute averages
would have over a million lines.
Simply by choosing the
timescale, you are
already influencing your
MVA results.
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 3
Tier 3, Rev.: 4
Choosing a timescale
The first thing we need to understand is what timescales are
available. For the TMP process we have been studying, the
shortest possible time period between two logged values is 10
seconds (note that not all tags are updated this frequently).
1 year
1 mo.
1 wk.
24 h
Chips sampled
every 8 hours
8h
1h
10 min
1 min
10 s
Pulp sampled
every 2 hours
Several key values, such as wood and pulp characteristics, are
only measured every few hours as shown above. These tags will
be of little or no use at a very short timescale.
IMPORTANT CONCEPT: Some variables can only be
studied at longer timescales, others at shorter timescales,
depending on their sampling/logging frequency.
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 3
Tier 3, Rev.: 4
Shortest possible timescale
For the purposes of illustration, we will use the shortest possible
timescale in this example, namely 10 seconds. Because some
tags are updated less frequently, we will use interpolated values
for all variables, which may or may not represent reality.
10 seconds
To keep the size of the dataset manageable, we have taken these
data over a 24-hour period, which corresponds to around
9,000 observations. Because we have over 100 tags, the
resulting dataset has about one million values.
A million values per day, for only one section of the
papermaking process - if we were to include the entire
industrial plant over several years, we would have to
analyse billions of datapoints.
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 3
Tier 3, Rev.: 4
PCA of entire 24-hour period
J un 20 02(1). 10 s ec onds COMPLETE W I TH 45 min LAG. M1 (PC A-X),RUntit
2X(c led
um)
Q2(cum)
1.00
0.80
0.60
0.40
Comp No.
Comp[3]
Comp[2]
0.00
Comp[1]
0.20
Simca found numerous
components  retained 3
The PCA for the entire 24-hour period shows quite a strong
model, with a cumulative R2 over 60%. This is misleading,
however. As shown on the score plot, there is a major process
excursion which has totally skewed the MVA results.
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 3
Tier 3, Rev.: 4
Major process excursion
Major process excursion
from 8h15 to 8h45
A review of the original data indicates that production dropped
below 10 t/d during a ten-minute period (8:15 to 8:25). The cause
was a major refiner blockage known as a “feedguard event”,
which makes the refiner motor shut down.
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 3
Tier 3, Rev.: 4
Exclude process excursion
The process excursion sticks out like a sore thumb on the score plot.
This means that the process temporarily went to a radically different
“place” or operating regime, where relationships between the
variables are different.
Sticking out like
a sore thumb…
or a solar flare
Trying to do PCA on several different operating regimes all at once is
a waste of time. The software will try to establish the correlations
between the different variables, and if these correlations change
abruptly the results will be useless. The way to get around this
problem is to divide the observations into different operating regimes,
and study each regime separately.
In this case we will remove the low production period to prevent it
from skewing the rest of the results.
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 3
Tier 3, Rev.: 4
PCA with process excursion
removed
We removed the entire period when the process was perturbed (8:10
to 8:45) and did a PCA on the rest of the observations.
J un 20 02(1). 10 s ec onds COMPLETE W I TH 45 min LAG. M2 (PC A-X), Ext reme
out liers
R 2X(c
um) remov ed
Q2(cum)
1.00
0.80
0.60
0.40
Comp[3]
Comp[2]
0.00
Comp[1]
0.20
Comp No.
Interestingly, the R2 values went down slightly. This is because many
of the variables changed abruptly all together when the process was
shut down, making it look like they were “correlated” with each other.
Remember, MVA knows nothing about the process, and just uses the
data as it is.
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 3
Tier 3, Rev.: 4
Score plot of normal operation
Now that we have removed the process upset, the score plot takes
on an entirely different character.
There is now an obvious time trend. During our 24-hour period, the
process “snakes” around in multi-dimensional space. It is a moving
target.
Whereas score plots for longer, averaged
periods generally resemble clouds, score
plots for short timescales resemble
snakes.
Almost all process data show this characteristic, because a real
process is never really in steady state. The process control systems
are constantly responding to outside perturbations, like changes in
feed material quality. Operator intervention is another source of
perturbation. There are many others. One operating goal is to
maintain the “snake” within a certain desirable zone.
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 3
Tier 3, Rev.: 4
Score plot showing time trend
End:
00:59
Start:
01:00
Obvious
time
trend…
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 3
Tier 3, Rev.: 4
What is the significance?
This “snaking” of the process at short timescales is highly significant.
This was not seen when using the daily averages.
By looking at which variables are changing with time, we can get
tremendous insight into the process dynamics. One way to do this is
to compare the contribution plots (like we saw in Example 2) at
different times.
Contribution plots for the start and end points of our 24-hour period
are shown on the next page. Obviously it is impossible to read the
names of all the variables, but that is not the point. Just look at the
bar graphs. They are very different, indicating a continuous change
in operating regime from start to finish.
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 3
Tier 3, Rev.: 4
J un 2 0 02 (1). 10 s ec o nd s CO MPLETE W I TH 4 5 m in L AG. M3 (PC A-X), Mo re e xt re m e o ut lie
Sc ore Co n trib(Ob s 7 91 0 - Av e ra ge ), W e igh t=p1 p2
33LI214.AI
52FFC117.PV
52FFC166.PV
52FIC115.PV
52FIC116.PV
52FIC154.PV
52FIC164.PV
52FIC165.PV
52FIC167.PV
52FIC177.PV
52HIC812.PV
52IIC128.PV
52IIC178.PV
52JCC139.PV
52JI189.AI
52JIC139.AI
52LIC106.PV
52PCA111.PV
52PCA161.PV
52PCB111.PV
52PCB161.PV
52PIC105.PV
52PIC159.PV
52PIC705.PV
52PIC961.PV
52SIC110.PV
52SQI110.AI
52TI011.AI
52TI031.AI
52TI118.AI
52TI168.AI
52TIC010.CO
52TIC793.PV
52XAI130.AI
52XIC130.AI
52XIC180.AI
52XPI130.AI
52XQI195.AI
52ZIC147.PV
52ZIC148.PV
52ZIC197.PV
52ZIC198.PV
53AI034.AI
53FFC455.PV
53FI012.AI
53HIC762.PV
53LIC011.PV
53LIC301.PV
53NI716.AI
53NIC013.PV
53PIC210.PV
53PIC305.PV
53PIC308.PV
53PIC309.PV
53WI012.AI
Pex_L1_Blan
Pex_L1_Cons
Pex_L1_CSF
Pex_L1_LMF
Pex_L1_P200
Pex_L1_PFC
Pex_L1_PFL
Pex_L1_PFM
Pex_L1_R100
Pex_L1_R14
Pex_L1_R28
Pex_L1_R48
53LIC510.PV
52FR960.AI
52FRA703.AI
52KQC139.AI
52KQC189.AI
52PI128.AI
52PI178.AI
52PI706.AI
52PIA143.AI
52PIA193.AI
52PIB143.AI
52PIB193.AI
52PIP143.AI
52PIP193.AI
52SI055.AI
52SIA110.AI
52TIC102.PV
52TIC711.PV
52TR964.AI
52XIC811.PV
52X_130.AI_split_L1.
52ZI144.AI
52ZI194.AI
53AIC453.PV
53LR405.AI
53LV301.AI
53NIC100.PV
85LCB320.AI
Score Contrib(Obs 7910 - Average), Weight=p1p2
J un 2 0 02 ( 1) . 10 s ec o nd s CO MPLETE W I TH 4 5 m in L AG. M3 (PC A-X),
Sc or e Co n tr ib( Ob s 4 57 - Av e ra g e) , W e ig ht =p 1p 2
33LI214.AI
52FFC117.PV
52FFC166.PV
52FIC115.PV
52FIC116.PV
52FIC154.PV
52FIC164.PV
52FIC165.PV
52FIC167.PV
52FIC177.PV
52HIC812.PV
52IIC128.PV
52IIC178.PV
52JCC139.PV
52JI189.AI
52JIC139.AI
52LIC106.PV
52PCA111.PV
52PCA161.PV
52PCB111.PV
52PCB161.PV
52PIC105.PV
52PIC159.PV
52PIC705.PV
52PIC961.PV
52SIC110.PV
52SQI110.AI
52TI011.AI
52TI031.AI
52TI118.AI
52TI168.AI
52TIC010.CO
52TIC793.PV
52XAI130.AI
52XIC130.AI
52XIC180.AI
52XPI130.AI
52XQI195.AI
52ZIC147.PV
52ZIC148.PV
52ZIC197.PV
52ZIC198.PV
53AI034.AI
53FFC455.PV
53FI012.AI
53HIC762.PV
53LIC011.PV
53LIC301.PV
53NI716.AI
53NIC013.PV
53PIC210.PV
53PIC305.PV
53PIC308.PV
53PIC309.PV
53WI012.AI
Pex_L1_Blan
Pex_L1_Cons
Pex_L1_CSF
Pex_L1_LMF
Pex_L1_P200
Pex_L1_PFC
Pex_L1_PFL
Pex_L1_PFM
Pex_L1_R100
Pex_L1_R14
Pex_L1_R28
Pex_L1_R48
53LIC510.PV
52FR960.AI
52FRA703.AI
52KQC139.AI
52KQC189.AI
52PI128.AI
52PI178.AI
52PI706.AI
52PIA143.AI
52PIA193.AI
52PIB143.AI
52PIB193.AI
52PIP143.AI
52PIP193.AI
52SI055.AI
52SIA110.AI
52TIC102.PV
52TIC711.PV
52TR964.AI
52XIC811.PV
52X_130.AI_split_L1.
52ZI144.AI
52ZI194.AI
53AIC453.PV
53LR405.AI
53LV301.AI
53NIC100.PV
85LCB320.AI
Score Contrib(Obs 457 - Average), Weight=p1p2
Time trend within the process
01:00
Contribution plots…
NAMP Module 17: “Introduction to Multivariate Analysis”
Mo re e xt re m e o ut lie
4
2
0
-2
V ar ID (P ri mary)
00:59
3
2
1
0
-1
-2
V ar ID (P rimary)
Tier 3, Rev.: 4
Studying the “snake”
To gain further insight, we can colour-code the observations on the
score plot. We did something similar in Example 1, when we colourcoded the days to show the seasons. This is very easy to do with
modern MVA software.
In this case, we have modified the score plot to show which range
that observation falls in for one of the variables. In this case we have
chosen “freeness”, an important pulp quality parameter which the
process control systems try to maintain at a constant value. We
could have chosen any variable.
Note that during the course of our 24-hour period, the freeness starts
high, then gets lower, then goes back up again. Someone with an
intimate knowledge of the process could gain insight from this result.
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 3
Tier 3, Rev.: 4
Score plot coloured for
“freeness”
Exactly the
same score
plot, coloured
for pulp
“freeness”
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 3
Tier 3, Rev.: 4
Score plot in 3-D
Same plot,
showing 3rd
component
Component 3
Component 1
Component 2
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 3
Tier 3, Rev.: 4
MVA “foresight”
Another powerful use of MVA over short timescales is to predict
problems before they become more widely visible.
The residuals plot on the next page tells the whole story. Remember
we said that the refiner shut down at 8:15 due to a blockage? It is
obvious that the process started to move away from normal operation
well before then. The operators tend to look at a handful of key
variables when monitoring the process, but MVA looks at all the
variables at the same time and
is therefore much more sensitive.
An analogy would be a
seismometer being used to
predict volcanic eruptions.
NAMP Module 17: “Introduction to Multivariate Analysis”
A seismometer
is extremely
sensitive to
the slightest
vibrations.
Example 3
Tier 3, Rev.: 4
Residuals plot showing
MVA “foresight”
Build-up to 8h15 –
something is happening
to the process!
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 3
Tier 3, Rev.: 4
Using shorter timescales
By now it should be clear that doing MVA at a shorter timescales is
totally different to studying averages taken over longer timespans.
Once again, we conclude that the best solution is to try many
different approaches. No single MVA approach will provide all the
answers we are seeking.
Part of the power of this technique is the way completely different
results can be obtained from exactly the same database, simply by
“slicing and dicing” the data in various ways:
• Longer vs. shorter timescales
• More vs. fewer variables
• PCA vs. PLS
MVA is just a “black box”. Its use MUST be driven
by an understanding of the process being studied,
otherwise it is just meaningless number-crunching.
“Number Cruncher”
NAMP Module 17: “Introduction to Multivariate Analysis”
Example 3
Tier 3, Rev.: 4
End of Example 3:
One step at a time…
NAMP Module 17: “Introduction to Multivariate Analysis”
Tier 3, Rev.: 4
End of Tier 2
Congratulations!
This is the end of Tier 2. Obviously the details of these examples are
hard to grasp for a first-timer, but hopefully some of the overall patterns
are starting to emerge. A true understanding of MVA can only come by
actually doing it on your own, which is the purpose of Tier 3.
All that is left is to complete the short quiz that follows…
NAMP Module 17: “Introduction to Multivariate Analysis”
Tier 3, Rev.: 4
Tier 2 Quiz
Question 1:
What is the difference between a tag and a variable?
a) The words “tag” and “variable” are synonyms.
b) A tag is an identity label or address, while a variable is an
attribute of the process.
c) Tags change with time, but variables are fixed.
d) Variables measure similar attributes, while tags measure
dissimilar attributes.
e) Answers (b) and (c).
NAMP Module 17: “Introduction to Multivariate Analysis”
Tier 2 Quiz
Tier 3, Rev.: 4
Tier 2 Quiz
Question 2:
Does averaging reduce or increase noise?
a)
b)
c)
d)
e)
Averaging increases noise significantly.
Averaging increases noise, but only slightly.
Averaging does not affect noise.
Averaging reduces noise.
Averaging reduces noise, but increases the likelihood of outliers.
NAMP Module 17: “Introduction to Multivariate Analysis”
Tier 2 Quiz
Tier 3, Rev.: 4
Tier 2 Quiz
Question 3:
What is the danger of interpolating between readings that are far
apart in time?
a) The interpolation will give far more weight to these individual
readings than they deserve.
b) The interpolated values will indicate slow upward and downward
trends where there are none.
c) The effect of outliers will be enhanced many-fold.
d) The engineer will have the false sense of comparing variables
that are similar, when in fact they are very different.
e) All of the above.
NAMP Module 17: “Introduction to Multivariate Analysis”
Tier 2 Quiz
Tier 3, Rev.: 4
Tier 2 Quiz
Question 4:
If interpolation is such a problem, then why can’t we just use the
discrete values instead?
a) This would give far too much weight to periods with a large
number of discrete values.
b) Discrete values must be averaged to have meaning.
c) No tag is ever truly discrete.
d) Discrete values have no time signature.
e) Answers (b) and (c).
NAMP Module 17: “Introduction to Multivariate Analysis”
Tier 2 Quiz
Tier 3, Rev.: 4
Tier 2 Quiz
Question 5:
What is the difference between a process lag and a delayed reading?
a) One is caused by the process itself, the other by the
measurement instruments.
b) They are the same thing.
c) A process lag is due to residence time, while a delayed reading
is due to the time required for sampling, measurement and
recording.
d) One is much longer than the other.
e) Answers (a) and (c).
NAMP Module 17: “Introduction to Multivariate Analysis”
Tier 2 Quiz
Tier 3, Rev.: 4
Tier 2 Quiz
Question 6:
Why does the MVA software reject variables that do not change
enough with time?
Only variables which are part of the “experiment” are permitted.
Tags change with time, but these variables are fixed.
There are insufficient data points.
If a variable does not change with time, then it cannot be
correlated to any other variables.
e) None of the above.
a)
b)
c)
d)
NAMP Module 17: “Introduction to Multivariate Analysis”
Tier 2 Quiz
Tier 3, Rev.: 4
Tier 2 Quiz
Question 7:
What should you do if your initial PCA gives a score plot with two
distinct and separate data clouds?
a) Study each data cloud separately.
b) Try to determine what these two clouds represent.
c) Ignore the first component, which is probably being artificially
induced by the two clouds.
d) Do an MVA on the entire dataset.
e) Answers (a), (b) and (c).
NAMP Module 17: “Introduction to Multivariate Analysis”
Tier 2 Quiz
Tier 3, Rev.: 4
Tier 2 Quiz
Question 8:
Your residual (“DModX”) plot shows several moderate outliers. What
should you do?
a)
b)
c)
d)
e)
Remove them and continue.
Leave them in and continue.
Study their contribution plots.
Look at the original data to try to determine the cause.
Answers (c) and (d).
NAMP Module 17: “Introduction to Multivariate Analysis”
Tier 2 Quiz
Tier 3, Rev.: 4
Tier 2 Quiz
Question 9:
Two variables are located in opposite corners of your PCA loadings
plot (components 1 and 2). What do you conclude?
a) These variables are uncorrelated with each other.
b) These variables are negatively correlated with each other.
c) These variables contribute to both the first and second
components.
d) These variables contribute to neither the first nor the second
component.
e) Answers (b) and (c).
NAMP Module 17: “Introduction to Multivariate Analysis”
Tier 2 Quiz
Tier 3, Rev.: 4
Tier 2 Quiz
Question 10:
Theoretically, on average what proportion of residuals should be above
the 95% confidence line? (the red line on the “DModX” plot)
a)
b)
c)
d)
e)
Exactly 0.05%
Exactly 5%.
More than 5%.
Less than 5%.
Depends on the dataset.
NAMP Module 17: “Introduction to Multivariate Analysis”
Tier 2 Quiz
Tier 3, Rev.: 4
TIER 3:
Open-Ended Problem
NAMP Module 17: “Introduction to Multivariate Analysis”
Tier 3, Rev.: 4
Tier 3: Statement of Intent
Tier 3: Statement of intent:
The goal of Tier 3 is to finally allow the student to do MVA
independently, though in a controlled context. At the end of Tier 3,
the student should know how to do the following:
• Prepare a spreadsheet for use in MVA
• Import spreadsheet into MVA software
• Set up dataset within MVA software
• Create simple PCA plots
• Identify and investigate major and moderate outliers
• Create and interpret more elaborate PCA plots
In order to avoid losing the student along the way, each of these
steps is broken down into a series of sub-steps with clear
instructions.
NAMP Module 17: “Introduction to Multivariate Analysis”
Open Problem
Tier 3, Rev.: 4
Tier 3: Contents
Tier 3 is broken down into four sections:
3.1 Problem Statement and Dataset
3.2 Preparing and Importing the Spreadsheet
3.3 Initial MVA Results
3.4 Outliers and More Elaborate MVA plots
Unlike the previous two sections, Tier 3 has no quiz. The student
must submit the results of the above work in a succinct project
report (10-15 pages).
NAMP Module 17: “Introduction to Multivariate Analysis”
Open Problem
Tier 3, Rev.: 4
3.1: Problem Statement
and Dataset
NAMP Module 17: “Introduction to Multivariate Analysis”
Open Problem
Tier 3, Rev.: 4
Problem Statement
Your are the process engineer at the TMP mill from the Tier 2
examples. Your boss, the plant manager, wants to know why the pulp
has different properties in the summer than in the winter.
You decide to start by generating PCA results for two different
datasets, one taken during the summer, the other during the winter,
and then comparing them to each other.
NAMP Module 17: “Introduction to Multivariate Analysis”
Open Problem
Tier 3, Rev.: 4
Summer/Winter datasets
After talking to the operators, you decide to take two full weeks of
data for 15 key tags, using 1-hour averages.
Your data have already been imported by an IT technician into a
standard spreadsheet software. The two files are:
• Summerdata.xls
• Winterdata.xls
These are the actual
data files you are
going to use!
Open these files, and have a look at the data. Can you tell anything
about the summer/winter question just by looking?
Of course not!
NAMP Module 17: “Introduction to Multivariate Analysis”
Open Problem
Tier 3, Rev.: 4
3.2: Preparing and
Importing the Spreadsheet
NAMP Module 17: “Introduction to Multivariate Analysis”
Open Problem
Tier 3, Rev.: 4
Preparing the spreadsheet
As you can see, the spreadsheet has two names for each variable:
• long descriptive name, and
• short “tag” for easy identification on the MVA graphs.
We want to do something similar with the individual observations.
The full time signature is too long, and will make the score plots
impossible to read. Besides, we already know which year and month
it is. This is not useful information. We therefore want to insert a
column to the right of the time signature, which gives the number of
hours from the start of the two-week period.
Do this now, for both spreadsheets. When you are done, save them
under a new name.
NAMP Module 17: “Introduction to Multivariate Analysis”
Open Problem
Tier 3, Rev.: 4
Importing the spreadsheet
Now we are ready to open the MVA software. Do it now.
The first thing we need to do is import the data. Go to “File: import
data”, and select your newly renamed file for summer.
The software will ask you a series of questions. Answer them
according to the instructions on Page 2 of the spreadsheet file. One
of these steps involves saving the new dataset as an MVA file.
Repeat this operation for the winter spreadsheet.
NAMP Module 17: “Introduction to Multivariate Analysis”
Open Problem
Tier 3, Rev.: 4
3.3: Initial MVA Results
NAMP Module 17: “Introduction to Multivariate Analysis”
Open Problem
Tier 3, Rev.: 4
Initial MVA results
Re-open the summer file, and create the following plot:
• Model bar chart
Copy it by right-clicking and import it
into your word processor file. All these
plots must appear in your report.
How many components does the software suggest? Usually for this
kind of initial exercise, keeping 3 components is normal. Eliminate
the components you do not intend to use.
Now create the following basic PCA plots:
• Score plots: t(1) vs. t(2)
What do you notice about the results? Right! There are major
outliers.
Now do the same for the winter dataset.
NAMP Module 17: “Introduction to Multivariate Analysis”
Open Problem
Tier 3, Rev.: 4
3.4: Outliers and More
Elaborate MVA Plots
NAMP Module 17: “Introduction to Multivariate Analysis”
Open Problem
Tier 3, Rev.: 4
Investigating Outliers
The summer data contains a major process excursion that is clearly
visible on the score plot. Looking at the original data, try to determine
the cause.
Once you are satisfied, remove the outliers and save the new model.
The winter data looks OK on the score plot, but that is not the entire
story. Generate the following residuals plot:
• DModX
What do you notice? Right! There is one major outlier. Create a
contribution plot to investigate:
• Contribution plot
What do you conclude? Remove this point and continue.
NAMP Module 17: “Introduction to Multivariate Analysis”
Open Problem
Tier 3, Rev.: 4
Comparing Summer and Winter
Now we are ready to compare the summer and winter results.
Create the following basic PCA plots:
• Score plots: t(1) vs. t(2); t(1) vs. t(3); 3-D plot
• Loadings plot: p(1) vs. p(2); p(1) vs. p(3); 3-D plot
Do you notice any major differences between summer and winter?
Of course you do! What are they?
And what does this imply about the cause of the summer/winter
process differences?
NAMP Module 17: “Introduction to Multivariate Analysis”
Open Problem
Tier 3, Rev.: 4
Drawing your conclusions
Now you have something to report to your boss…
NAMP Module 17: “Introduction to Multivariate Analysis”
Open Problem
Tier 3, Rev.: 4
More Elaborate MVA Plots
To get familiar with some of the other MVA outputs, create the
following for the final summer and winter datasets:
• DModX
• X/Y Contribution Plot
• Residuals distribution
Don’t
•…
just
•…
guess!
What do these plots indicate to you? Don’t worry about finding the
“right” answer, just try to figure out what these plots are trying to tell
us. However, you must justify your answers. Don’t just guess.
NAMP Module 17: “Introduction to Multivariate Analysis”
Open Problem
Tier 3, Rev.: 4
End of Tier 3
Congratulations!
This is the end of Module 17. Please submit your report to your
professor for grading.
We are always interested in suggestions on how to improve the course.
You may contact us as www.namppimodule.org
NAMP Module 17: “Introduction to Multivariate Analysis”
Tier 3, Rev.: 4