STATGRAPHICS Centurion XV

Download Report

Transcript STATGRAPHICS Centurion XV

Statistical Tools for
Multivariate Six Sigma
Dr. Neil W. Polhemus
CTO & Director of Development
StatPoint, Inc.
1
The Challenge
The quality of an item or service usually depends on
more than one characteristic.
When the characteristics are not independent,
considering each characteristic separately can give a
misleading estimate of overall performance.
2
The Solution
Proper analysis of data from such processes
requires the use of multivariate statistical
techniques.
3
Outline

Multivariate SPC




Multivariate control charts
Multivariate capability analysis
Data exploration and modeling

Principal components analysis (PCA)

Partial least squares (PLS)

Neural network classifiers
Design of experiments (DOE)

Multivariate optimization
4
Example #1
Textile fiber
Characteristic #1: tensile strength - 115 ± 1
Characteristic #2: diameter - 1.05 ± 0.05
5
Sample Data
n = 100
6
Individuals Chart - strength
X Chart for strength
115.8
UCL = 115.69
CTR = 114.98
LCL = 114.27
115.5
X
115.2
114.9
114.6
114.3
114
0
20
40
60
80
100
Observation
7
Individuals Chart - diameter
X Chart for diameter
1.058
UCL = 1.06
CTR = 1.05
LCL = 1.04
1.055
X
1.052
1.049
1.046
1.043
1.04
0
20
40
60
80
100
Observation
8
Capability Analysis - strength
Process Capability for strength
LSL = 114.0, Nominal = 115.0, USL = 116.0
24
Normal
Mean=114.978
Std. Dev.=0.238937
frequency
20
DPM = 30.76
16
Cp = 1.41
Pp = 1.40
Cpk = 1.38
Ppk = 1.36
K = -0.02
12
8
4
0
114
114.4
114.8
115.2
115.6
116
strength
9
Capability Analysis - diameter
Process Capability for diameter
LSL = 1.04, Nominal = 1.05, USL = 1.06
20
frequency
16
Normal
Mean=1.04991
Std. Dev.=0.00244799
DPM = 44.59
Cp = 1.41
Pp = 1.36
Cpk = 1.39
Ppk = 1.35
K = -0.01
12
8
4
0
1.04
1.044
1.048
1.052
diameter
1.056
1.06
10
Scatterplot
Plot of strength vs diameter
116
correlation = 0.89
strength
115.5
115
114.5
114
1.04
1.045
1.05
1.055
1.06
diameter
11
Multivariate Normal Distribution
Multivariate Normal Distribution
114
114.5
115
115.5
1.06
1.055
1.05
1.045
1.04
116
diameter
strength
12
Control Ellipse
Control Ellipse
115.8
strength
115.5
115.2
114.9
114.6
114.3
114
1.04
1.043
1.046
1.049 1.052
diameter
1.055
1.058
13
Multivariate Capability
Determines joint probability of being within the
specification limits on all characteristics
Variable
strength
diameter
Joint
Observed
Beyond Spec.
0.0%
0.0%
0.0%
Estimated
Beyond Spec.
0.00307572%
0.00445939%
0.00703461%
Estimated
DPM
30.7572
44.5939
70.3461
14
Multivariate Capability
Multivariate Normal Distribution
DPM = 70.3461
1.065
1.06
1.055
1.05
1.045
113.5 114 114.5
1.04
1.035
115 115.5 116
116.5
diameter
strength
15
Capability Ellipse
99.73% Capability Ellipse
MCP =1.27
1.065
1.06
diameter
1.055
1.05
1.045
1.04
1.035
113.5
114
114.5
115
115.5
strength
116
116.5
16
Mult. Capability Indices
Defined to give the
same DPM as in the
univariate case.
Capability Indices
Index
Estimate
MCP
1.27
MCR
78.80
DPM
70.3461
Z
3.80696
SQL
5.30696
17
Test for Normality
P-Values
Probability Plot
3.4
strength
diameter
2.4
empirical data
strength
diameter
Shapiro-Wilk
0.408004
0.615164
1.4
0.4
-0.6
-1.6
-2.6
-2.6
-1.6
-0.6
0.4
1.4
normal distribution
2.4
3.4
18
More than 2 Characteristics
Calculate T-squared:
Ti  ( xi  x ) S ( xi  x )
2
1
where
S = sample covariance matrix
x
= vector of sample means
19
T-Squared Chart
Multivariate Control Chart
UCL = 11.25
30
T-Squared
25
20
15
10
5
0
0
20
40
60
80
Observation
100
120
20
T-Squared Decomposition
Subtracts the value of T-squared if each variable is
removed.
T-Squared Decomposition
Relative Contribution to T-Squared Signal
Observation
T-Squared diameter strength
17
26.3659
22.9655
25.951
Large values indicate that a variable has an important
contribution.
21
Control Ellipsoid
Control Ellipsoid
rnormal(100,10,1)
12.8
11.8
10.8
9.8
8.8
116
115.6
115.2
6.8
114.8
114.4
1.04 1.044 1.048
1.052 1.056 1.06 114
strength
diameter
7.8
22
Multivariate EWMA Chart
Multivariate EWMA Control Chart
UCL = 11.25, lambda = 0.2
15
Largest
strength
diameter
T-Squared
12
9
6
3
0
0
20
40
60
80
100
120
Observation
23
Generalized Variance Chart
Plots the determinant of the variance-covariance matrix for data
that is sampled in subgroups.
Generalized Variance Chart
(X 1.E-7)
6
UCL = 3.281E-7
CL = 7.01937E-8
LCL = 0.0
Gen. Variance
5
4
3
2
1
0
0
4
8
12
16
20
24
Subgroup
24
Data Exploration and Modeling
When the number of variables is large, the
dimensionality of the problem often makes it
difficult to determine the underlying
relationships.
Reduction of dimensionality can be very helpful.
25
Example #2
26
Matrix Plot
MPG City
MPG Highw ay
Engine Size
Horsepow er
Fueltank
Passengers
Length
Wheelbase
Width
U Turn Space
Weight
27
Analysis Methods

Predicting certain characteristics based on
others (regression and ANOVA)

Separating items into groups (classification)

Detecting unusual items
28
Multiple Regression
MPG City = 29.6315 + 0.28816*Engine Size - 0.00688362*Horsepower - 0.297446*Passengers 0.0365723*Length + 0.280224*Wheelbase + 0.111526*Width - 0.139763*U Turn Space - 0.00984486*Weight
Parameter
CONSTANT
Engine Size
Horsepower
Passengers
Length
Wheelbase
Width
U Turn Space
Weight
Estimate
29.6315
0.28816
-0.00688362
-0.297446
-0.0365723
0.280224
0.111526
-0.139763
-0.00984486
Standard
Error
12.9763
0.722918
0.0134153
0.54754
0.0447211
0.124837
0.218893
0.17926
0.00192619
T
Statistic
2.28351
0.398607
-0.513119
-0.543241
-0.817786
2.24472
0.5095
-0.779668
-5.11104
P-Value
0.0249
0.6912
0.6092
0.5884
0.4158
0.0274
0.6117
0.4378
0.0000
R-squared = 73.544 percent
R-squared (adjusted for d.f.) = 71.0244 percent
Standard Error of Est. = 3.02509
Mean absolute error = 1.99256
29
Principal Components
The goal of a principal components analysis (PCA) is
to construct k linear combinations of the p
variables X that contain the greatest variance.
C1  a11 X 1  a12 X 2  ... a1 p X p
C2  a21 X 1  a22 X 2  ... a2 p X p
…
Ck  ak1 X 1  ak 2 X 2  ...  akp X p
30
Scree Plot
Shows the number of significant components.
Scree Plot
6
Eigenvalue
5
4
3
2
1
0
0
2
4
Component
6
8
31
Percentage Explained
Principal Components Analysis
Component
Percent of
Number
Eigenvalue Variance
1
5.8263
72.829
2
1.09626
13.703
3
0.339796
4.247
4
0.270321
3.379
5
0.179286
2.241
6
0.12342
1.543
7
0.109412
1.368
8
0.0552072
0.690
Cumulative
Percentage
72.829
86.532
90.779
94.158
96.400
97.942
99.310
100.000
32
Components
Table of Component Weights
Component
1
Engine Size
0.376856
Horsepower
0.292144
Passengers
0.239193
Length
0.369908
Wheelbase
0.374826
Width
0.38949
U Turn Space
0.359702
Weight
0.396236
Component
2
-0.205144
-0.592729
0.730749
0.0429221
0.259648
-0.0422083
-0.0256716
-0.0298902
First component
0.376856*Engine Size + 0.292144*Horsepower + 0.239193*Passengers + 0.369908*Length
+ 0.374826*Wheelbase + 0.38949*Width + 0.359702*U Turn Space + 0.396236*Weight
Second component
-0.205144*Engine Size – 0.592729*Horsepower + 0.730749*Passengers + 0.0429221*Length
+ 0.259648*Wheelbase - 0.0422083*Width - 0.0256716*U Turn Space – 0.0298902*Weight
33
Interpretation
Plot of C_2 vs C_1
3
Type
Compact
Large
Midsize
Small
Sporty
Van
C_2
1
-1
-3
-5
-6
-4
-2
0
C_1
2
4
6
34
Principal Component Regression
MPG City = 22.3656 - 1.84685*size + 0.567176*unsportiness
Parameter
CONSTANT
size
unsportiness
Estimate
22.3656
-1.84685
0.567176
Standard
Error
0.353316
0.147168
0.339277
T
Statistic
63.302
-12.5492
1.67172
P-Value
0.0000
0.0000
0.0981
R-squared = 64.0399 percent
R-squared (adjusted for d.f.) = 63.2408 percent
Standard Error of Est. = 3.40726
Mean absolute error = 2.26553
35
Partial Least Squares (PLS)
Similar to PCA, except that it finds components
that minimize the variance in both the X’s and
the Y’s.
May be used with many X variables, even
exceeding n.
36
Component Extraction
Starts with number of components equal to the
minimum of p and (n-1).
Model Comparison Plot
Percent variation
100
X
Y
80
60
40
20
0
1
2
3
4
5
6
7
8
Number of components
37
Coefficient Plot
PLS Coefficient Plot
0.5
MPG City
MPG Highway
Fueltank
0.1
-0.1
-0.3
-0.5
Weight
U Turn Space
Width
Wheelbase
Length
Passengers
Horsepower
-0.7
Engine Size
Stnd. coefficient
0.3
38
Model in Original Units
MPG City = 50.0593 – 0.214083*Engine Size - 0.0347708*Horsepower
- 0.884181*Passengers + 0.0294622*Length - 0.0362471*Wheelbase
- 0.0882233*Width - 0.0282326*U Turn Space - 0.00391616*Weight
39
Classification
Principal components can also be used to
classify new observations.
A useful method for classification is a Bayesian
classifier, which can be expressed as a neural
network.
40
6 Types of Automobiles
Plot of unsportiness vs size
3
Type
Compact
Large
Midsize
Small
Sporty
Van
unsportiness
1
-1
-3
-5
-6
-4
-2
0
size
2
4
6
41
Neural Networks
Input layer
Pattern layer
Summation layer
Output layer
(2 variables)
(93 cases)
(6 neurons)
(6 groups)
42
Bayesian Classifier

Begins with prior probabilities for membership in
each group

Uses a Parzen-like density estimator of the density
function for each group
 X  Xi
1
g j ( X )   exp 
2

n j i 1


nj
2




43
Options


The prior
probabilities
may be
determined in
several ways.
A training set
is usually
used to find a
good value for
.
44
Output
Number of cases in training set: 93
Number of cases in validation set: 0
Spacing parameter used: 0.0109375 (optimized by jackknifing during training)
Training Set
Type
Compact
Large
Midsize
Small
Sporty
Van
Total
Members
16
11
22
21
14
9
93
Percent Correctly
Classified
75.0
100.0
77.2727
76.1905
85.7143
100.0
82.7957
45
Classification Regions
Classification Plot
sigma = 0.0109375
3
Type
Compact
Large
Midsize
Small
Sporty
Van
unsportiness
1
-1
-3
-5
-6
-4
-2
0
size
2
4
6
46
Changing Sigma
Classification Plot
sigma = 0.3
3
Type
Compact
Large
Midsize
Small
Sporty
Van
unsportiness
1
-1
-3
-5
-6
-4
-2
0
size
2
4
6
47
Overlay Plot
Classification Plot
sigma = 0.3
3
Type
Compact
Large
Midsize
Small
Sporty
Van
unsportiness
1
-1
-3
-5
-6
-4
-2
0
2
4
6
size
48
Outlier Detection
Control Ellipse
5
unsportiness
3
1
-1
-3
-5
-8
-4
0
4
8
size
49
Cluster Analysis
Cluster Scatterplot
Method of k-Means,Squared Euclidean
3
Cluster
1
2
3
4
Centroids
unsportiness
1
-1
-3
-5
-6
-4
-2
0
size
2
4
6
50
Design of Experiments
When more than one characteristic is important,
finding the optimal operating conditions
usually requires a tradeoff of one
characteristic for another.
One approach to finding a single solution is to
use desirability functions.
51
Example #3
Myers and Montgomery (2002) describe an experiment
on a chemical process:
Response variable
Goal
Conversion percentage
maximize
Thermal activity
Maintain between 55 and 60
Input factor
Low
High
time
8 minutes
17 minutes
temperature
160˚ C
210˚ C
catalyst
1.5%
3.5%
52
Experiment
run
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
time
(minutes )
10.0
15.0
10.0
15.0
10.0
15.0
10.0
15.0
8.3
16.7
12.5
12.5
12.5
12.5
12.5
12.5
12.5
12.5
12.5
12.5
temperature
(degrees C )
170.0
170.0
200.0
200.0
170.0
170.0
200.0
200.0
185.0
185.0
160.0
210.0
185.0
185.0
185.0
185.0
185.0
185.0
185.0
185.0
catalyst
(percent )
2.0
2.0
2.0
2.0
3.0
3.0
3.0
3.0
2.5
2.5
2.5
2.5
1.66
3.35
2.5
2.5
2.5
2.5
2.5
2.5
conversion
activity
74.0
51.0
88.0
70.0
71.0
90.0
66.0
97.0
76.0
79.0
85.0
97.0
55.0
81.0
81.0
75.0
76.0
83.0
80.0
91.0
53.2
62.9
53.4
62.6
57.3
67.9
59.8
67.8
59.1
65.9
60.0
60.7
57.4
63.2
59.2
60.4
59.1
60.6
60.8
58.9
53
Step #1: Model Conversion
Standardized Pareto Chart for conversion
AC
C:catalyst
CC
B:temperature
BB
BC
AA
AB
A:time
+
-
0
2
4
6
8
Standardized effect
54
Step #2: Optimize Conversion
Goal: maximize conversion
Optimum value = 118.174
Factor
Low
High
time
8.0
17.0
temperature
160.0 210.0
catalyst
1.5
3.5
Optimum
17.0
210.0
3.48086
Contours of Estimated Response Surface
temperature=210.0
3.5
catalyst
3
2.5
2
1.5
8
9
10
11
12
13
14
15
16
17
conversion
70.0
72.5
75.0
77.5
80.0
82.5
85.0
87.5
90.0
92.5
95.0
97.5
100.0
time
55
Step #3: Model Activity
Standardized Pareto Chart for activity
A:time
C:catalyst
AA
AB
B:temperature
BC
BB
CC
AC
+
-
0
2
4
6
8
Standardized effect
56
Step #4: Optimize Activity
Goal: maintain activity at 57.5
Optimum value = 57.5
Factor
Low
High
time
8.3
16.7
temperature
209.99 210.01
catalyst
1.66
3.35
Optimum
10.297
210.004
2.31021
Contours of Estimated Response Surface
activity
55.0
56.0
57.0
58.0
59.0
60.0
temperature=210.0
3.5
catalyst
3
2.5
2
1.5
8
9
10
11
12
13
14
15
16
17
time
57
Step #5: Select Desirability Fcns.
Maximize
0
20
40
60
80
1 00
Desirability Function for Maximization
De sira bility, d
1
0 .8
s = 0 .2
0 .6
s = 0 .4
s=1
0 .4
s=2
0 .2
s=8
0
Low
Hig h
Predicted response
58
Desirability Function
Hit Target
Desirability Function for Hitting Target
De sira bility, d
1
0 .8
s = 0.1
0 .6
t = 0.1
s=1
0 .4
t=1
s=5
t=5
0 .2
0
0
20
Low
40
60
Targ
et
80
Hig h
1 00
Predicted response
59
Combined Desirability

D  d d ...d
I1
1
I2
2
Im
m

1/




Ij

j 1

m

where m = # of factors and 0 ≤ Ij ≤ 5. D ranges
from 0 to 1.
60
Example
Optimum value = 0.949092
Factor
Low
High
time
8.0
17.0
temperature
160.0 210.0
catalyst
1.5
3.5
Response
conversion
activity
Low
50.0
55.0
Response
conversion
activity
Optimum
95.0388
57.5
Optimum
11.1394
210.0
2.20119
High
100.0
60.0
Goal
Maximize
57.5
Weights
First
1.0
1.0
Weights
Second
1.0
Impact
3.0
3.0
61
Desirability Contours
Contours of Estimated Response Surface
temperature=210.0
3.5
catalyst
3
2.5
2
1.5
8
9
10
11
12
13
14
15
16
17
Desirability
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
time
62
Desirability Surface
Estimated Response Surface
temperature=210.0
1
Desirability
0.8
0.6
0.4
3.5
0.2
3
2.5
0
8
9
10 11 12
13 14
15 16 17
time
2
catalyst
1.5
63
Overlaid Contours
Overlay Plot
temperature=210.0
3
conversion
activity
catalyst
2.8
2.6
2.4
2.2
2
10
11
12
13
14
15
time
64
References

Johnson, R.A. and Wichern, D.W. (2002). Applied Multivariate Statistical
Analysis. Upper Saddle River: Prentice Hall.Mason, R.L. and Young, J.C.
(2002).

Mason and Young (2002). Multivariate Statistical Process Control with
Industrial Applications. Philadelphia: SIAM.

Montgomery, D. C. (2005). Introduction to Statistical Quality Control, 5th
edition. New York: John Wiley and Sons.

Myers, R. H. and Montgomery, D. C. (2002). Response Surface
Methodology: Process and Product optimization Using Designed
Experiments, 2nd edition. New York: John Wiley and Sons.
65
PowerPoint Slides
Available at:
www.statgraphics.com/documents.htm
66