STATGRAPHICS Centurion XV

Download Report

Transcript STATGRAPHICS Centurion XV

Statistical Tools for
Multivariate Six Sigma
Dr. Neil W. Polhemus
CTO & Director of Development
StatPoint, Inc.
Revised talk:
www.statgraphics.com\documents.htm
1
The Challenge
The quality of an item or service usually depends on
more than one characteristic.
When the characteristics are not independent,
considering each characteristic separately can give a
misleading estimate of overall performance.
2
The Solution
Proper analysis of data from such processes
requires the use of multivariate statistical
techniques.
3
Important Tools

Statistical Process Control
 Multivariate capability analysis
 Multivariate control charts

Statistical Model Building*
 Data Mining - dimensionality reduction
 DOE - multivariate optimization
* Regression and classification.
4
Example #1
Textile fiber
Characteristic #1: tensile strength (115.0 ± 1.0)
Characteristic #2: diameter (1.05 ± 0.01)
5
Individuals Charts
X Chart for strength
115.8
UCL = 115.69
CTR = 114.98
LCL = 114.27
115.5
X
115.2
114.9
114.6
114.3
114
0
20
40
Observation
60
80
100
X Chart for diam eter
1.058
UCL = 1.06
CTR = 1.05
LCL = 1.04
1.055
X
1.052
1.049
1.046
1.043
1.04
0
20
40
Observation
60
80
100
6
Capability Analysis (each separately)
Process Capability for strength
LSL = 114.0, Nominal = 115.0, USL = 116.0
frequency
16
LSL = 1.04, Nominal = 1.05, USL = 1.06
Normal
15
Mean=114.978
Std. Dev.=0.238937
12
DPM=30.76
Cp = 1.41
Pp = 1.40
Cpk = 1.38
Ppk = 1.36
K = -0.02
12
8
frequency
20
Process Capability for diameter
0
114
0
1.04
115.2
strength
115.6
116
Cp = 1.41
Pp = 1.36
Cpk = 1.39
Ppk = 1.35
K = -0.01
6
3
114.8
DPM=44.59
9
4
114.4
Normal
Mean=1.04991
Std. Dev.=0.0024479
1.044
1.048
1.052
1.056
1.06
diameter
7
Scatterplot
Plot of diameter vs strength
1.06
Correlation = 0.89
diameter
1.055
1.05
1.045
1.04
114
114.6
115.2
strength
115.8
8
Multivariate Normal Distribution
Multivariate Normal Distribution
114
114.5
115
115.5
1.06
1.055
1.05
1.045
1.04
116
diameter
strength
9
Control Ellipse
Control Ellipse
1.06
diameter
1.055
1.05
1.045
1.04
114
114.6
115.2
115.8
strength
10
Multivariate Capability
Determines joint probability of being within
the specification limits on all characteristics.
Multivariate Capability Plot
DPM = 70.4091
Observed
Variable Beyond Spec.
strength 0.0%
diameter 0.0%
Joint
0.0%
Estimated
DPM
30.7572
44.5939
70.4091
1.065
1.055
1.045
113.6 114.4
1.035
115.2 116
116.8
diameter
strength
11
Mult. Capability Indices
Defined to give the
same DPM as in the
univariate case.
Capability Indices
Index
Estimate
MCP
1.27
MCR
78.81
DPM
70.4091
Z
3.81
SQL
5.31
12
More than 2 Variables
Control Ellipsoid
14.2
X3
12.2
10.2
8.2
6.2
5.8
7.8
9.8
11.8 13.8
15.8
14.1
12.1
10.1
8.1
X2
6.1
X1
13
Hotelling’s T-Squared
Measures the distance of each point from the
centroid of the data (or the assumed
distribution).
Ti  ( xi  x ) S ( xi  x )
2
1
14
T-Squared Chart
Multivariate Control Chart
UCL = 11.25
30
T-Squared
25
20
15
10
5
0
0
20
40
60
80
Observation
100
120
15
T-Squared Decomposition
T-Squared Decomposition
Relative Contribution to T-Squared Signal
Observation
T-Squared X1
X2
17
13.8371
4.54101 0.340022
X3
8.35196
The StatAdvisor
This table decomposes the out-of-control signals on the T-Squared chart. It calculates the relative importance of each
variable to the signal by subtracting the value of T-Squared calculated without using that variable from the full TSquared value. Examine each row closely to determine which variable (or variables) are likely causing that signal.
16
Statistical Model Building




Defining relationships (regression and ANOVA)
Classifying items
Detecting unusual events
Optimizing processes
When the response variables are correlated, it is
important to consider the responses together.
When the number of variables is large, the
dimensionality of the problem often makes it
difficult to determine the underlying relationships.
17
Example #2
18
Matrix Plot
MPG City
MPG Highw ay
Engine Size
Horsepow er
Length
Passengers
U Turn Space
Weight
Wheelbase
Width
19
Multiple Regression
MPG City = 29.6315 + 0.28816*Engine Size - 0.00688362*Horsepower - 0.0365723*Length 0.297446*Passengers - 0.139763*U Turn Space - 0.00984486*Weight + 0.280224*Wheelbase + 0.111526*Width
Parameter
CONSTANT
Engine Size
Horsepower
Length
Passengers
U Turn Space
Weight
Wheelbase
Width
Estimate
29.6315
0.28816
-0.00688362
-0.0365723
-0.297446
-0.139763
-0.00984486
0.280224
0.111526
Standard
Error
12.9763
0.722918
0.0134153
0.0447211
0.54754
0.17926
0.00192619
0.124837
0.218893
T
Statistic
2.28351
0.398607
-0.513119
-0.817786
-0.543241
-0.779668
-5.11104
2.24472
0.5095
P-Value
0.0249
0.6912
0.6092
0.4158
0.5884
0.4378
0.0000
0.0274
0.6117
20
Reduced Models
MPG City = 29.9911 - 0.0103886*Weight + 0.233751*Wheelbase (R2=73.0%)
MPG City = 64.1402 - 0.054462*Horsepower - 1.56144*Passengers - 0.374767*Width
(R2=64.3%)
21
Dimensionality Reduction
Construction of linear combinations of the variables can often provide
important insights.

Principal components analysis (PCA) and principal components
regression (PCR): constructs linear combinations of the predictor
variables X that contain the greatest variance and then uses those to
predict the responses.

Partial least squares (PLS): finds components that minimize the
variance in both the X’s and the Y’s simultaneously.
22
Principal Components Analysis
C1  a11 X 1  a12 X 2  ... a1 p X p
Principal Components Analysis
Component
Percent of
Number
Eigenvalue Variance
1
5.8263
72.829
2
1.09626
13.703
3
0.339796
4.247
4
0.270321
3.379
5
0.179286
2.241
6
0.12342
1.543
7
0.109412
1.368
8
0.0552072
0.690
Cumulative
Percentage
72.829
86.532
90.779
94.158
96.400
97.942
99.310
100.000
23
Scree Plot
Scree Plot
6
Eigenvalue
5
4
3
2
1
0
0
2
4
6
8
Component
24
Component Weights
C1 = 0.377*Engine Size + 0.292*Horsepower + 0.239*Passengers + 0.370*Length
+ 0.375*Wheelbase + 0.389*Width + 0.360*U Turn Space + 0.396*Weight
C2 = -0.205*Engine Size – 0.593*Horsepower + 0.731*Passengers + 0.043*Length
+ 0.260*Wheelbase – 0.042*Width – 0.026*U Turn Space – 0.030*Weight
25
Interpretation
Biplot
7
Passengers
Component 2
5
3
Wheelbase
1
Length
Weight
U Turn Space
Width
-1
Engine Size
-3
Horsepow er
-5
-6
-4
-2
0
2
4
6
Component 1
26
PC Regression
Estimated Response Surface
60
MPG City
50
40
30
20
10
0
-6
-4
-2
0
C1
2
4
6
-5
-3
-1
1
C2
3
MPG City
0.0
5.0
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
50.0
55.0
27
Contour Plot
Contours of Estimated Response Surface
3
MPG City
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
C2
1
-1
-3
-5
-6
-4
-2
0
2
4
6
C1
28
PLS Model Selection
Model Comparison Plot
Percent variation
100
X
Y
80
60
40
20
0
1
2
3
4
5
6
Number of components
7
8
29
PLS Coefficients
Selecting to extract 3 components:
Standardized Coefficients
MPG City
Constant
0.0
Engine Size
-0.0375246
Horsepower
-0.329264
Length
0.0802132
Passengers
-0.178438
U Turn Space -0.0484675
Weight
-0.428481
Wheelbase
-0.0149712
Width
-0.0320902
MPG Highway
0.0
0.0659656
-0.39319
0.22243
-0.331005
-0.00202398
-0.642872
0.0592427
0.0532588
Unstandardized Coefficients
MPG City
Constant
47.6716
Engine Size
-0.203286
Horsepower
-0.0353303
Length
0.0308705
Passengers
-0.965169
U Turn Space -0.0845038
Weight
-0.00408204
Wheelbase
-0.0123371
Width
-0.0477221
MPG Highway
35.6569
0.339043
-0.0400268
0.0812151
-1.69862
-0.00334794
-0.00581054
0.0463168
0.0751422
30
Interpretation
Plot of unsportiness vs size
unsportiness
3
Type
Compact
Large
Midsize
Small
Sporty
Van
1
-1
-3
-5
-6
-4
-2
0
2
4
6
size
31
Neural Networks
32
Bayesian Classifier
Input layer
Pattern layer
Summation layer
Output layer
(2 variables)
(93 cases)
(6 neurons)
(6 groups)
33
Classification
Classification Plot
sigma = 0.3
3
Type
Compact
Large
Midsize
Small
Sporty
Van
C2
1
-1
-3
-5
-6
-4
-2
0
2
4
6
C1
34
Design of Experiments
When more than one characteristic is important,
finding the optimal operating conditions
usually requires a tradeoff of one
characteristic for another.
One approach to finding a single solution is to
use desirability functions.
35
Example #3
Myers and Montgomery (2002) describe an experiment on a
chemical process (20-run central composite design):
Response variable
Goal
Conversion percentage
maximize
Thermal activity
Maintain between 55 and 60
Input factor
Low
High
time
8 minutes
17 minutes
temperature
160˚ C
210˚ C
catalyst
1.5%
3.5%
36
Optimize Conversion
Goal: maximize conversion
Optimum value = 118.174
Factor
Low
High
time
8.0
17.0
temperature
160.0 210.0
catalyst
1.5
3.5
Optimum
17.0
210.0
3.48086
Contours of Estimated Response Surface
temperature=210.0
3.5
catalyst
3
2.5
2
1.5
8
9
10
11
12
13
14
15
16
17
conversion
70.0
72.5
75.0
77.5
80.0
82.5
85.0
87.5
90.0
92.5
95.0
97.5
100.0
time
37
Optimize Activity
Goal: maintain activity at 57.5
Optimum value = 57.5
Factor
Low
High
time
8.3
16.7
temperature
209.99 210.01
catalyst
1.66
3.35
Optimum
10.297
210.004
2.31021
Contours of Estimated Response Surface
activity
55.0
56.0
57.0
58.0
59.0
60.0
temperature=210.0
3.5
catalyst
3
2.5
2
1.5
8
9
10
11
12
13
14
15
16
17
time
38
Desirability Functions
Maximization
0
20
40
60
80
1 00
Desirability Function for Maximization
1
De sira bility, d

0 .8
s = 0 .2
0 .6
s = 0 .4
s=1
0 .4
s=2
0 .2
s=8
0
Low
Hig h
Predicted response
39
Desirability Functions
Hit a target
Desirability Function for Hitting Target
1
De sira bility, d

0 .8
s = 0.1
0 .6
t = 0.1
s=1
0 .4
t=1
s=5
t=5
0 .2
0
0
20
Low
40
60
Targ
et
80
Hig h
1 00
Predicted response
40
Combined Desirability

D ( X )  d d ...d
I1
1
I2
2
Im
m

1/ 




Ij 

j 1

m

di = desirability of i-th response given the settings of the
m experimental factors X.
D ranges from 0 (least desirable) to 1 (most desirable).
41
Desirability Contours
Max D=0.959 at time=11.14, temperature=210.0, and catalyst = 2.20.
Contours of Estimated Response Surface
temperature=210.0
3.5
catalyst
3
2.5
2
1.5
8
9
10
11
12
13
14
15
16
17
Desirability
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
time
42
Desirability Surface
Estimated Response Surface
temperature=210.0
1
Desirability
0.8
0.6
0.4
3.5
0.2
3
2.5
0
8
9
10 11 12
13 14
15 16 17
time
2
catalyst
1.5
43
References

Johnson, R.A. and Wichern, D.W. (2002). Applied Multivariate Statistical
Analysis. Upper Saddle River: Prentice Hall.Mason, R.L. and Young, J.C.
(2002).

Mason and Young (2002). Multivariate Statistical Process Control with
Industrial Applications. Philadelphia: SIAM.

Montgomery, D. C. (2005). Introduction to Statistical Quality Control, 5th
edition. New York: John Wiley and Sons.

Myers, R. H. and Montgomery, D. C. (2002). Response Surface
Methodology: Process and Product Optimization Using Designed
Experiments, 2nd edition. New York: John Wiley and Sons.
Revised talk: www.statgraphics.com\documents.htm
44