#### Transcript The History of a Research Project

Preliminaries • Thank you John McCool – Good excuse to review, think about and give an overview of a 25 year research project • Stat Day: A worthwhile conference – I have attended it frequently • What can/should we do to advance statistics Last years Stat Day: Follow-up • Drove up with Ron Snee and Steve Bailey • Good Discussion when driving home with two major topics: – 1) Should we use and teach Definitive Screening Designs? Do they add much to Plackett-Burman Designs for Screening? – 2) Status of “Living A Great Life: Mathematical Models for Living” –a book that I may write Definitive Screening Designs • Jones, B and Nachtsheim, C. J. (2011) “A Class of Three-Level Designs for Definitive Screening in the Presence of Second-Order Effects” JQT, 43,1 pp 1- 15 • Jones, B and Nachtsheim, C. J. (2013) “Definitive Screening Designs with Added Two-Level Categorical Factors” JQT, 45, 2, pp 121-129 • Xiao, L., Lin, D. K. J. and Bai, F. (2012) “Constructing Definitive Screening Designs Using Conference Matrices” JQT 44, 1, pp 2-8 Properties: Estimates linear and quadratic terms For each factor Design can be saturated: e.g. 6 factors in 13 runs Can be blocked Add center points for each block Can study 2-level categorical factors Replace 0s with ±1 Near orthogonal Should use Conference Matrix Computer search with 10,000 starts Will not find best design Even Number of Factors Use Dummy columns Recommendation: Use instead or reflected Plackett-Burman Design e.g. 24(vs 25) points to study 11(12)factors Analysis Table for The 12-Run Plackett Burman Design (L12) Trial Avg X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 1 + + + - + + + - - - + - 2 + + - + + + - - - + - + 3 + - + + + - - - + - + + 4 + + + + - - - + - + + - 5 + + + - - - + - + + - + 6 + + - - - + - + + - + + 7 + - - - + - + + - + + + 8 + - - + - + + - + + + - 9 + - + - + + - + + + - - 10 + + - + + - + + + - - - 11 + - + + - + + + - - - + 12 + - - - - - - - - - - - Sum + Sum Check Differ. Effect Y Living A Great Life: Mathematical Models for Living • How to live the best life given your starting status – Life is a process not a destination – Always have a Plan B • Time is the universal limiting resource • Living well is the best revenge • There are good mathematical models for many aspects of life: – Buying a house – Finding a spouse/life partner RSM & Optimum Design Theory • Kiefer, J. C. (1959) “Optimum Experimental Designs” JRSS B 21, 272-319 • Lucas, J.M. (1976) “Which Response Surface Design is Best” Technometrics 16, 411-417 • Original Paper was rejected by Technometrics – just a calculation exercise • Closed RSM to Optimum Design Theory: – Composite and Box-Behnken Designs are so efficient that there is little possibility of better designs. – Bill Notz of OSU, a Kiefer student, confirmed this in his talk at the 2013 FTC • Split-Plot Designs Re-Opened the door • Possible Ph.D. Topics here Large Response Surface Designs – Crosier, R.B (1991) “Some New Three-Level Response Surface Designs” Technical Report CRDEC-TR-308, US Army Chemical Research, Development and Engineering Center (Available at Stat. Lab Server, Carnegie Mellon University) Up to 15 factors • I recommended this paper; however, it was rejected by Technometrics – Mee, R. W. (2007) Optimal Three-Level Designs for Response Surfaces in Spherical Experimental Regions. (Up to 16 factors and references a 14 factor application) – This paper includes Crosier’s designs Letter Results • Crosier paper was rejected, decision was unchanged – No Longer an Associate Editor of Technometrics – Would have become longest term AE The History of a Research Project By James M. Lucas 2014 National Quality Month Statistical Symposium Penn State Great Valley October 17, 2014 Abstract • • • I describe how a simple observation lead to a long term research project studying experiments with Hard-to-Change and Easy-to-Change Factors. The observation was that most experiments I had been involved in were not randomized even though the experimental runs were conducted using a random run order. Randomization requires that each experimental unit be treated the same way. So for a randomized experiment a random run order is necessary but it is not sufficient to achieve randomization. Resetting of all experimental factors is also required before randomization is achieved. I will give an overview of experiments with Hard-to-Change and Easy-to-Change Factors, and describe recent research and practical applications. I will discuss the contributions of my students and me, describe personal interactions that occurred with other researchers and journal editors during this project, and show how they advanced or slowed down scientific progress. I will discuss the problems and advantages of early disclosure of research results including the disadvantage of preemption by others and the advantage of faster dissemination of scientific results. I will describe current research efforts and current applications in this very active area and tell what problems need to be solved to best help experimenters. Audience Questions • Why? • To be sure I am telling you something useful • To be on the same page that you are on DOE Questions • Are you involved with running experiments? • How often do your experiments involve hardto-change factors? (First asked at ‘92 JSM) – Seldom (<10% of the time) – Sometimes (<50% of the time) – Often (>50% of the time) • Software developer surprised at large fraction of “Often” answers DOE Questions • How many of you are involved with running experiments? • How many of you “randomize” to guard against trends or other unexpected events? • If the same level of a factor such as temperature is required on successive runs, how many of you set that factor to a neutral level and reset it? 15 Additional Questions • How many of you have conducted experiments on the same process on which you have implemented a Quality Control Procedure? – What did you find? 16 MY OBSERVATIONS • Comparing residual standard deviation from an experiment with residual standard deviation from an in-control process. – Experimental standard deviation is larger. 1.5X to 3X is common. • Why? You are inducing variation by making changes in the process 17 Background • Why I made the observation that most experiments I had been involved in were not randomized even though the experimental runs were conducted in a random run order • I had the mixed model tools from the 1980s – DuPont ASG’s Mixed Model Program based on Fellner, William H. (1986) “Robust Estimates of Variance Components” Technometrics 58, pp 51-60 • Commercially Available much later – SAS PROC MIXED in 1996 – JMP6 in 2005 Introduction • Experiments with H-T-C and/or E-T-C factors occur frequently • Industrial experiments are seldom really randomized – They often shouldn’t be randomized • Split-Plot experiments (intentionally or inadvertently) are often conducted • Proper Split-Plot Blocking is often the answer • Good Reference: – Jones, B. and Nachtsheim, C. J. (2009) “Split-Plot Designs: What, Why and How” JQT 41, 4 pp 340-361 Good Quote • Cuthbert Daniel (1976) comes to split-plot experiments from an affordability perspective when he says that “In nearly all experimental situations some factors are hard to vary, whereas others, if not easy, are at least amenable to deliberate variation. Most industrial experiments are, then, split-plot in their design. The total number of runs is largely determined by the number of combinations of the hard-to-vary factors that can be afforded.” • Compare this quote to J&N (2009): • All industrial experiments are split-plot experiments Framework (For much of my research in Design of Experiments) • DuPont Applied Statistics Group’s Strategy of Experimentation (SOE) Course – 2½ day course – Taken by a large fraction of DuPont’s professionals • Managerial familiarity with DOE – Sells statistics • Designs Used: – Plackett-Burman Screening Designs – Two level Factorials for process improvement – Box-Bhnken and Composite Response Surface Designs for Process Optimization My DOE Course • Based on the DuPont Course • Includes recommendations for Hard-to-Cange and Easy-to-Change factors • Screening will also include definitive Screening Designs • Also Emphasize “Bold Experimentation” Statistical Systems: Introduction • Statistical Systems – Take a Systems Approach to solve problems that have statistical aspects – Tie together statistical techniques and engineering/scientific knowledge – Many were developed following WW II • SOE is a useful statistical system • Good Example of Statistical Engineering 23 Experiments with H-T-C Factors: Approach • Get Help: Four Ph. D Dissertations: • Huey Ju (U of D) – Expected Variance over all Randomizations • Jeetu Ganju (U of D) – Bias of RNR Experiments • Frank Anbari (Drexel) – Optimum blocking with One H-T-C factor • Derek Webb (Montana State) – More than one H-T-C factor Aside: My Decision for LAGL • U of D Adjunct from 1970s – Organized team taught QC Course – Previously directed 3 Ph.D. Dissertations • Why I am not an academic – Vince Laricca told me: “I fought very hard to keep you from joining the department because you would have Made the department too applied.” • How did that work out for him? • Other Academic Opportunities: – Drexel-did not like the commute – Penn State-move was too disruptive • I love to Consult Once Bitten: Conducting the wrong experiment • 3-Factor Box-Behnken design for Mylar – One factor easy-to-change – Factor had “wrong” sign – Factor was not significant • Learning: Better Consulting • Better Experiment – – – – 3 x 3 Factorial 3 settings of the easy-to-change factor 27 versus 15 runs Larger experiment can be cheaper 26 RSM: One Easy-to-Change Factor • Design good experiment in H-T-C factors – Examine all levels of E-T-C Factor at each setting of H-T-C Factors (and often use more levels) – Gives a Classic Split-Plot Experiment • Examples: – Pharmaceutical Pill experiments: compression – Plating Process: Current ratio • E-T-C factor often has the largest effect Example: Experiments Used to Develop The Ni/Pd Plating Process One Hard-to-Change Factor • Plan good Experiment ignoring H-T-C aspect • Very Expensive to change factor – Completely restrict running H-T-C factor – Sometimes the only feasible experiment • Visit each level of the H-T-C factor twice using Standard Blocking: – 4 Factor Composite (FCC) Show. – Box-Behnken often has 2 (or 3) Orthogonal Blocks – Completely restrict each block –Good Ph.D. Project Good Standard Blocking Are more Blocks needed? Potential Ph. D. Topic Fewer center points Ju and Lucas (2002) History • Originally submitted to Technometrics in 1994 – Associate Editor: Dick DeVeaux – Reviewers: Ray Myers, Tom Lorenzen • Revisions got no closer to publication – AE personal letter • Advanced Science by motivating papers published before it was: – Letsinger, J. D., Myers, R. H. and Lentner, M. (1996) Response Surface Methods for Bi-Randomization Structures” JQT 28, pp 381-397 – Ray said that he expected our paper to be published before his was • Ju and Lucas Response Surface Paper was not published Comments on Letsinger et. al. (1996) • High Impact paper – Recommended REML for analysis of unbalanced Split-Plot experiments • Visited each level of the H-T-C factor only once – A definite weakness in the designs considered – Minimum-Aberation Designs – Split-Plots vs Generalized S-Ps Randomized Not Reset (RNR) Experiments; Also Called Random Run Order (RRO) • A large fraction (perhaps a large majority) of industrial experiments are Randomized not Reset (RNR) experiments • Properties of RNR experiments and a discussion of how experiments should be conducted: – “Lk Factorial Experiments with Hard-to-Change and Easy-to-Change Factors” Ju and Lucas, 2002, JQT 34, 411-421 [studies one H-T-C factor and uses Random Run Order (RRO) rather than RNR] – “Factorial Experiments when Factor Levels Are Not Necessarily Reset” Webb, Lucas and Borkowski, 2004, JQT 36, 1, pp 1-11 33 Not Resetting Factors • Common practice • Has had many successes! – Complete randomization may be impractical • Not addressed by the classical definition – Gives a split-plot blocking structure with the blocks determined at random • May be cost effective • Causes biased hypothesis tests (Ganju and Lucas 1997, 1999, 2005) 34 Ganju and Lucas References • Ganju, J., and Lucas, J. M. (1997). “Bias in Test Statistics when Restrictions on Randomization are Caused by Factors”. Communications in Statistics – Theory and Methods, 26, pp. 47-63. • Ganju, J., and Lucas, J. M. (1999). “Detecting Randomization Restrictions Caused by Factors”. Journal of Statistical Planning and Inference, 81, pp. 129-140. • Ganju, J., and Lucas, J. M. (2000). “Analysis of Unbalanced Data from an Experiment with Random Block Effects and Unequally Spaced Factor Levels”. The American Statistician, 54, 1, pp. 5-11. • Ganju, J., and Lucas, J. M. (2005). “Randomized and Random Run Order Experiments” Journal of Statistical Planning and Inference 133, pp.199-210. 35 An Essential Element of Randomized Not Reset (RNR) Experiments (DuPont Applied Statistics Group) 37 Randomization Questions? • Should Industrial Experiments be Randomized? – Historically most have not been randomized even if a random run order was used • How large is the experimental Error? – RSM tools assume that the error is small • Can confirmatory runs be made? • Experimental error is increased by setting factor levels – Cost benefit analysis is needed • Discussion paper – not a Ph.D topic Analysis Comments • Consultants Rule: 90% of the information gained from Appropriate Plot. • Academics are heavily into the formal analysis – Even though is often adds little • Wayne Nelson (2007) said “I have often designed and conducted experiments as splitplots but I have seldom bothered to analyze them as split-plots.” Experiment With Random Block Effects • 3 Temperatures x 3 Times x 12 Days – Factorial with 10 extra center points – Almost balanced • At ’92 JSM in Toronto told Andre he had missed significant DayxTime Interaction in: – Kuhri, A. I. (1992) “RS Models with Random Block Effects” Technometrics 34, 26-37 • He Wrote: – Khuri, A. I. (1994) “RS Models with Mixed Effects” JQT 28, 177-186 • This used RSM approach rather than model structure so we played off of his analysis and wrote: Ganju, J. and Lucas, J. M. “Analysis of Unbalanced Data From an Experiment with Random Block Effects and Unequally Spaced Factor Levels” (2000) The American Statistician 54, 1, 5-11 The Essential Plot Main Effects >90% of SS Shows significant high order interactions The Formal Analysis: Adds Little Comments: Coding Used _1, 0, 1 Not equally spaced levels Other analyses conducted Current Research Project-Partial Confounding Purpose: To describe partial confounding and to evaluate the advantages it provides when a blocked 2-level factorial is used to estimate a main effects plus two-factor interaction model. Abstract: Traditionally blocked 2-level factorial experiments may confound some 2- factor interactions with blocks. Increased precision can be obtained by the use of partial confounding. Partial confounding uses fractional factorials and a different confounding relationship for each fraction to increase the precision of estimates. We describe partial confounding for 2-level factorials for traditional and split-plot blocking. We show the increases in precision obtained by partial confounding and show when it is useful. We give many examples of blocked and split-plot experiments where partial confounding provides increased precision. The precision of partially confounded designs is compared with the precision obtained using computer generated designs. A simple example illustrates partial confounding. Consider a 24 experiment that is run in 4blocks. The traditional blocking procedure is to use blocking generators ABC and ABD so that the 2-factor interaction CD is also confounded with blocks. Partial confounding uses the two half fractions with defining contrast I = ABC and uses a different blocking procedure in each half fraction. In one half- fraction ABD and CD are confounded with blocks and in the other half fraction ACD and BD are confounded with blocks. Increased overall precision is obtained because the confounded interaction terms are partially estimated within blocks. 43 New Observation: Improved Blocking for Factorial Designs • Why Block? – Increase precision of the experiment – Reduce bias – Better answers to the questions of interest • Consider an experiment when 4 runs can conveniently be done in a shift. With Block size 4, the shift-to-shift variation is placed in Blocks. • How should we do the blocking? 44 24 in Four Blocks - Notes • Confounded Blocking Relationship: • I = ABC = ABD = CD – Two-Factor Interaction is also confounded • Model of Interest: 4 main Effects plus 6 Two-factor interactions • Conducting the experiment: – Randomize Block Order – Randomize Run Order in Each Block • Can we do better? – Computer Generated Design – Partially Confounded design 45 Confounded Blocking: 24 Factorial in Four Blocks Obs. A B C D Blk 1 - - - - 1 2 + - - - 2 3 - + - - 2 4 + + - - 1 5 - - + - 3 6 + - + - 4 7 - + + - 4 8 + + + - 3 9 - - - + 4 10 + - - + 3 11 - + - + 3 12 + + - + 4 13 - - + + 2 14 + - + + 1 15 - + + + 1 16 + + + + 2 46 Parameter Variances • V(β) = σb2 /4 + σe2/16 – Constant and CD terms • V(β) = σe2/16 – All other model terms • Design is Orthogonal • Maximum Variance of Prediction is the sum of the parameter variances 47 Properties of Confounded 24 Design Prediction Variance Profile Average Variance = 0.465278 Maximum Variance = (2/4)σb2 + (11/16)σe2 = 0.5 + 0.6875 = 1.1875 when σ = 1 48 Globally Optimum Design (A design that may unachievable) • Power-Of-Orthogonality (POO) Theorem – Orthogonal with maximum diagonals of – V-, G-, A- & D Optimal X’V-1X Optimal Max. Var. = (1/4)σb2 + (11/16)σe2 = 0.25 + 0.6875 = 0.9375 when σ = 1 Corollary: 2k-p is globally optimum when no model terms are confounded with blocks 49 Computer Generated Design: JMP Custom Design with 10,000 Starts Run Random Block 1 1 2 1 3 1 4 1 5 2 6 2 7 2 8 2 9 3 10 3 11 3 12 3 13 4 14 4 15 4 16 4 Design Evaluation Prediction Variance Profile X1 -1 1 -1 1 -1 -1 -1 1 1 1 1 -1 1 -1 -1 1 X2 -1 -1 1 1 1 -1 -1 -1 1 1 -1 1 -1 -1 1 1 X3 1 -1 -1 -1 1 -1 1 1 -1 1 1 1 -1 -1 -1 1 X4 1 -1 -1 1 1 1 -1 1 -1 1 -1 -1 1 -1 1 -1 Y . . . . . . . . . . . . . . . . Average Variance = 0.454653 50 Partially Confounded Blocking: 24 Factorial in Four Blocks Obs. A B C D Blk 1 - - - - 1 2 + - - - 2 3 - + - - 23 4 + + - - 1 5 - - + - 32 6 + - + - 4 7 - + + - 4 8 + + + - 3 9 - - - + 4 10 + - - + 3 11 - + - + 32 12 + + - + 4 13 - - + + 23 14 + - + + 1 15 - + + + 1 16 + + + + 2 51 Partial Confounding Relationship I=-ABC Half Fraction block on ABD So Blocks 1 and 4 are exactly the same as for confounded blocking I=ABC Half Fraction block on ACD So half the items in Blocks 2 and 3 will change signs Average Variance = 0.446759 Prediction Variance Profile 52 Illigitemi non carborundum Questions? Comments? 53 Partial Confounding Discussion • Should a computer program beat a Grand Master Chess player? – Yes – Chess is a deterministic game • Should a computer program be able to beat an experienced experimental designer? – No – Science is open-ended • Opportunity for research where a computer generated design is current best practice – Extend to more factors – Workable Ph.D. Dissertation 54 Classical Definition: A Completely Randomized Design • “Completely randomized designs are designs in which the assignment of factorlevel combinations to a test run sequence or to experimental units (physical entities on which measurements are taken) is made by a random process where all assignments are equally likely” Gunst(QP, Feb. 2000) 55 The Classical Definition is Inadequate • It does not address resetting so it does not address how industrial and scientific experiments are conducted • It does not address the inherent split-plot aspects of experiments using equipment – This effects the desired inferences – New edition of MGH changes definition Letter to the editor on “Randomization is the Key to Experimental Design Structure” by R. F. Gunst, Quality Progress (2000), May, 14. 56 Operational Definition: A Completely Randomized Design • Observation = Model + Error • A completely randomized design is achieved by using a process that makes the errors independent – A random order is necessary but not sufficient to achieve a CRD – Do what is needed so that each experimental unit is treated I the same way – Consistent with Fisher 57 Split-Plot Experiments • Main Effects Are (Partially or Fully) confounded with Blocks (Cochran and Cox 1957) • Others use a less general definition Purpose of a Ph. D. Advisor • Provide a workable topic that can be completed within a year Types of Factors • Require Resetting for Randomization – Hard-to-Change (HTC) • Temperature – Easy-to-Change (ETC) • Current density • Not requiring resetting • Surfactant type • Determined by experimental situation 60