Transcript Document
Challenges in Modeling COMPLEXITIES OF MODELS • Large State Space (e.g. Bedrock, Wireless handoff) – Model construction problem – Model solution problem • Model Stiffness. Fast and slow rates acting together – Failure And Recovery/Repair (HSP Markov model in Bedrock) – Performance and failure (Wireless handoff) COMPLEXITIES OF MODELS (Continued) • Modeling Non-Exponential Distributions (e.g. N+1 problem) • Believability/Understandability/Usability • What about software? Potential Solutions • Largeness – Largeness Tolerance – Largeness Avoidance LARGENESS TOLERANCE • Automated Model Construction – Loops in the specification of CTMC (SHARPE) – Stochastic Petri nets (SPNP, SHARPE) – High level languages (SAVE, QNAP, ASSIST, SDM) – Fault-Tree + Recovery Info (HARP) – Object-Oriented Approaches (TANGRAM) LARGENESS TOLERANCE (Continued) • Efficient numerical solution techniques – Sparse Storage – Accurate and Efficient Solution Methods We have Generated and Solved Models with 1,000,000 states (has gone up considerably recently) Steady-State : NEAR-Optimal SOR Transient: Modified Jensen's method MODEL SPECIFICATION LANGUAGES • Different languages can be used to specify a single model type: SAVE, QNAP, SPNP all appear very different; underlying model type is Markov • Same language can be used to specify different model types:SPNP input language used for Markovian SPN analytic numeric solution or non-Markovian SPN simulation solution MODEL SPECIFICATION LANGUAGES (Continued) • Languages can be domain specific: – Reliability: HARP, SDM – Availability: SAVE – Performance: RESQ, QNAP • Language can be domain independent: – SHARPE, SPNP LARGENESS AVOIDANCE • Non-State-Space methods – Reliability block diagrams – Fault-trees – Product-Form Queuing Networks • Approximate solutions – State Truncation SAVE, SPNP (Kantz and Trivedi: PNPM91) Case Study: JPL REE System Availability Modeling in Spacecraft Architecture LARGENESS AVOIDANCE (Cont.) • Stochastic Petri Nets (State-space-based modeling) • State truncation by introducing guard function Guard g is defined as If (mark(“…_dn”) >= K) return (0); else return (1); SPN MODELING AVAILABILITY MEASURES LARGENESS AVOIDANCE • Approximate solutions – Hierarchical Decomposition and Fixed-Point Iteration among submodels: • Heidelberger and Trivedi; IEEE-TC,1983 (Queueing Models) • Ciardo and Trivedi; PNPM91 (SPN Models) • Tomek and Trivedi (Availability Models) • Lanus, Liang & Trivedi: (Bedrock) • Wireless handoff work: Ma, Han & Trivedi (Continued) LARGENESS AVOIDANCE (Continued) • Approximate solutions – Performability: Multiprocessor example – Fluid Approximation: Mitra; Kulkarni; Ciardo; Nicol, and Trivedi; FSPN Difficulties in Modeling Using MRMs • Stiffness Causes numerical difficulties in solution – Stiffness Tolerance Develop stiffness tolerant numerical solution methods – Stiffness Avoidance Avoid generating stiff models through decomposition Potential Solutions (Continued) • Stiffness – Stiffness Tolerance – Stiffness Avoidance • Modeling Non-Exponential Distributions – Stage-type expansion, MRGP, NHCTMC, DES STIFFNESS TOLERANCE • Automatic Detection of Stiffness (HARP) • Special Stable ODE Solver Reibman and Trivedi (TR-BDF2) Computers and Operations Research, 1988. Malhotra and Trivedi (Pade, Implicit RK) STIFFNESS TOLERANCE (Continued) • Uniformization for Stiff Markov Chains Muppala and Trivedi We can solve models with rate ratios of 108 or higher Implemented in SHARPE & SPNP STIFFNESS AVOIDANCE • Model-level decomposition – Hierarchical Composition (SHARPE) Composition of Submodel solutions without generating a single one-level overall model (Bedrock example) – Fixed-Point Iteration (Wireless handoff example) STIFFNESS AVOIDANCE (Continued) • Importance Sampling (simulation) – Lewis, Goyal, Heidelberger, Shahbuddin, Geist, Nicola – Can also apply to analytic-numeric methods (Heidelberger, Muppala, and Trivedi; Performance 93) • Importance splitting (Simulation) – Tuffin and Trivedi; Tools’ 00 Non-Exponential Behavior • Non state space models: Fault Trees, Reliability Graphs, RBDs; no problem Non-Exponential Behavior in State Space Models NON-EXPONENTIAL DISTRIBUTIONS • Phase-Type Expansions – N+1 example • Non-Homogeneous Markov Chains CARE III, HARP Soft Rel model with imperfect repairs solved using SHARPE NON-EXPONENTIAL DISTRIBUTIONS (Continued) • Semi-Markov Chains N+1 example • Markov Regenerative Processes: Choi, Logothetis, Kulkarni, Trivedi • DSPN and MRSPN: Choi, Kulkarni, Trivedi • Discrete-Event Simulation Now in SPNP (FSPN and Non-Markovian SPN Simulation), RESQ, QNAP, Bones, SES workbench CASE STUDY: AT & T • GSHARPE: – A Preprocessor to SHARPE developed at Bell Labs by a Duke Student. – User can specify Weibull Failure times and lognormal and other repair time distributions. – GSHARPE fits these to phase type distributions and produces a Markov model that is generated for processing by SHARPE Potential Solutions (Continued) • Believability/Understandability/Usability – GUI, many practical examples, short-courses, tools, Boeing SDM project • Incorporation in the design process – VHDL Availability Model, – C Program Perf. Model – Ada Program SPN Perf. Model (SPC) • Connection between measurements & models BELIEVABILITY UNDERSTANDABILITY • Integration of Measurements and Models – Measurements Provide Parameters to Models – Models Provide Guidelines For Measurements – Models Validated Against Measurements • Integration of Different Modeling Tools – Boeing SDM project BELIEVABILITY/ UNDERSTANDABILITY (Continued) • Many Case-Studies of Validations Needed – Vaxcluster Availability Model: Wein & Sathaye – Hsueh, Iyer and Trivedi; IEEE-TC, Apr. 1988 – Lucent Validation of ESS; Veena Mendiratta • Technology Transfer – Short courses – Development and Dissemination of Tools (SHARPE, SPNP) BELIEVABILITY/ UNDERSTANDABILITY (Continued) • Application of the Techniques and Tools – Motorola – Cisco – 3Com – HP – Sun CASE STUDY: BOEING • An Integrated Reliability Environment • A working prototype • Developed a high-level modeling language (SDM) • Designed and implemented an intelligent interpreter CASE STUDY: BOEING (Continued) • Interpreter determines which solution method is applicable • Translator translates the SDM input file into an input file of any of the engines down below • Five different modeling engines are integrated: – CAFTA, SETS, EHARP, SHARPE and SPNP. MODELING AND MEASUREMENTS: INTERFACES • Measurements supply Input Parameters to Models (Model Calibration or Parameterization) Confidence Intervals should be obtained Boeing, Draper, Union Switch projects • Model Sensitivity Analysis can suggest which Parameters to Measure More Accurately: Blake, Reibman and Trivedi: SIGMETRICS 1988; Fricks and Trivedi: 1997 MODEL CALIBRATION What is ? • Fault Model for Each Component – Design,Manufacturing: Heisenbugs, Bohrbugs – Operational: Permanent, Intermittent,Transient – Human • Fault Arrival Processes (PP,Weibull,NHPP) • Failure Rates (Sources:MIL-STD) MODEL CALIBRATION (Continued) What is c ? • Field Data • Fault/Error Injection (FIAT,MESSALINE) • Analytic Coverage Model What is ? • Maintenance Model Corrective; dispatch , travel, repair time, dead on arrival, imperfect repair Preventive MODEL CALIBRATION (Continued) What is r ? • Binary: Up & Down • Capacity-Oriented: Number of Operational Resources in Each State • Performance-Oriented: Evaluate Perf. in Each Degraded Level of Syst. Config. 1. Measurements 2. Simulation Model 3. Analytic Model -- SHARPE, SPNP VALIDATION&VERIFICATION – Validation: Does the conceptual model faithfully reflect the behavior of the system? – Verification: Has the conceptual model been correctly implemented? MODEL VALIDATION (Continued) • Three step process outlined by Naylor and Finger – Face validation: Discussion with the experts – Input-Output validation: Compare results obtained from model with those from measurements – Validation of model assumptions: Either prove that the assumptions are correct or do statistical testing MODEL ASSUMPTIONS/ERRORS • Errors in Model Structure – Missing or Extra Arcs – Missing or Extra States – Use Face Validation to avoid these errors. • Errors Due to Non-Independence • Distributional Errors • Parametric Errors MODEL ASSUMPTIONS/ ERRORS(Continued) • Errors Due Approximations – Decomposition/Aggregation/Iteration – State Truncation • Numerical Solution Errors – Discretization Errors – Round-Off Errors Model Verification • Programming Errors • Approximation errors: Tight bounds due to approximations are desirable • Numerical: Errors in numerical algorithms should be bounded What about software? • Testing phase – Software reliability estimation • Black-box based approach • Architecture-based approach • Operational phase – Fault tolerance coverage (c in Markov model) – Countering software aging • Symptom-based fault management Conclusions: • Availability evaluation is very important in characterizing systems • Evaluation can be performed either through measurements, simulation or analytical modeling • Model verification and validation should form an integral part of the modeling process