Common Parametric Distributions Gentle - Analytica Wiki

Download Report

Transcript Common Parametric Distributions Gentle - Analytica Wiki

Common Parametric Distributions

Gentle Introduction to Modeling Uncertainty Series #6

Lonnie Chrisman, Ph.D.

Lumina Decision Systems Analytica Users Group Webinar 10 June 2010 Copyright © 2010 Lumina Decision Systems, Inc.

Course Syllabus

• • • • • • • Over the coming weeks: What is uncertainty? Probability.

Probability Distributions Monte Carlo Sampling Measures of Risk and Utility Risk analysis for portfolios

Common parametric distributions

Assessment of Uncertainty Hypothesis testing Copyright © 2010 Lumina Decision Systems, Inc.

Today’s Topics

• • • • • Continuous vs. discrete.

Non-parametric distributions.

A handful of the most common distributions.

The cases where each is useful.

How to encode each in Analytica.

Lots of model building exercises… Copyright © 2010 Lumina Decision Systems, Inc.

Outline (Order of exercises)

• • • • • • • • “Pre-test” questions Discrete non-parametric: Monte Hall game Continuous non-parametric: Data resampling Event counts: Durations between events Uncertain percentages Bounded Bell shapes Copyright © 2010 Lumina Decision Systems, Inc.

Distribution Types

• Discrete • Continuous Copyright © 2010 Lumina Decision Systems, Inc.

Custom (Non-parametric) Discrete ChanceDist(P,A,I)

• • • Parameters: P = Array of probabilities. Sum(P,I)=1 A = Array of possible outcomes I = Index shared by P and A Note: When A is the index, you can use: ChanceDist(P,A) Copyright © 2010 Lumina Decision Systems, Inc.

ChanceDist Exercise

• • An event occurs on one of the 7 days of the week.

Each weekday  8% Each day of weekend  30% Create a chance variable named Day_of_event with this distribution.

Copyright © 2010 Lumina Decision Systems, Inc.

ChanceDist Exercise 2: Monte Hall Game

You are a contestant on a game show. A prize is hidden behind 1 of three curtains. You select curtain 1.

“Before opening your curtain,” says the host, “let me reveal one of the unselected curtains that does not contain the prize… Curtain 2 is empty! Would you now like to change curtains?” Task: Build an Analytica model, computing the probability of winning the prize if you do or do not change curtains.

Copyright © 2010 Lumina Decision Systems, Inc.

Monte Hall Steps

1.

2.

3.

4.

• Chance: Start with the uncertain real location of the prize.

Model how the host decides which curtain to show you.

He will never reveal the prize or your selected curtain. Otherwise he picks randomly.

Decision: Change or not?

Objective: Probability that your final selection is the one with the prize.

Copyright © 2010 Lumina Decision Systems, Inc.

Custom (non-parametric) Continuous Distributions

• CumDist(p,x,i) Parameters: p : Probabilities that value <= x x : Ascending set of values i : index shared CumDist(p,x,x) or just CumDist(p,x) Copyright © 2010 Lumina Decision Systems, Inc.

CumDist Exercise

• A geologist estimates the capacity of a recently discovered oil deposit. He expresses is assessments as follows: 100% that

100K < capacity < 1B

barrels 90% that

5M < capacity < 500M

75% that

50M < capacity < 100M

barrels barrels Median estimate:

75M

barrels • Use CumDist to encode these estimates as a distribution for capacity.

Copyright © 2010 Lumina Decision Systems, Inc.

Homework challenge: Using CumDist to Resample

• • • You have 143 measured values of a quantities. Define an uncertain variable with the same implied distribution (even though your sample size doesn’t match).

Here is your synthetic data: Index

Data_i

:= 1..143

Variable

Data

:= ArcCos(Random( over:data_i)) Steps (the parameters to CumDist): Sort Data in ascending order: at 0 and ending at 1.

Sort(Data,Data_i) Compute p – equal probability steps along Data_I, starting Copyright © 2010 Lumina Decision Systems, Inc.

The Most Commonly used Parametric Distributions

• Discrete: Bernoulli Poisson Binomial Uniform integer • Continuous: Normal LogNormal Uniform Triangular Exponential Gamma Beta Copyright © 2010 Lumina Decision Systems, Inc.

Why chose one distribution over another?

• • Discrete or continuous?

Bounded quantity or infinite tails?

Continuous Discrete Bounded both sides Uniform Triangular Beta Binomial Uniform int One-sided tail LogNormal Gamma Exponential Poisson Two tailed Normal

StudentT Logistic

Copyright © 2010 Lumina Decision Systems, Inc.

Why chose one distribution over another?

• • • • • Discrete or continuous?

Bounded quantity or infinite tails?

Convenience Some distributions are more “natural” for certain types of quantities.

Ease of assessment.

Analytical properties for mathematicians – not model builders.

x Other than broad properties, the sensitivity of computed results to specific choice of distributions for assessments is usually extremely low. Copyright © 2010 Lumina Decision Systems, Inc.

Distributions for Integer-valued Counts #1

Poisson(mean) Count of events per unit time.

# Earthquakes >6.0 in a given year # Vehicles that pass in a given hour # Alarms in a given month # Pelicans rescued from oil spill today When the occurrence of each event is independent of the time of occurrence of other events, the # of occurrences in any given window is Poisson distributed.

Copyright © 2010 Lumina Decision Systems, Inc.

Distributions for Integer-valued Counts #2

• Binomial(n,p) Number of times an event occurs in probability

p

.

n

repeated independent trials, each having # oil well blowouts in the next 100 deep-water wells drilled.

# people that visit a store in its first month out of the 10,000 residents of the town.

# of positive test results in 50 samples tested.

Copyright © 2010 Lumina Decision Systems, Inc.

Exercise with event counts

In a certain region, malaria infections occur at an average rate of 500 infections per year. 10% of infections are fatal.

Build an Analytica model to compute the distribution for the number of people expected to die from a malaria infection in a given year.

Copyright © 2010 Lumina Decision Systems, Inc.

Duration between events

• • Exponential(rate) When events occur independently at a given rate, this gives the time between successive events.

Note: rate = 1 / meanArrivalTime Gamma(a,1/rate) Time for a independent events to occur, each having a mean arrival time of 1/rate .

Copyright © 2010 Lumina Decision Systems, Inc.

Arrival times exercise

• • • Cars arrive at a stoplight at a rate of 5 per minute. There is room for 10 cars before nearby freeway traffic is blocked.

Graph the CDF for the amount of time until cars begin to block freeway traffic when the light is red.

If the light stays red for 90 seconds, what fraction of red light-change cycles will result in blocked traffic?

Copyright © 2010 Lumina Decision Systems, Inc.

Uncertain Percentages

• • Beta(a,b) Useful for modeling uncertainty about a probability or percentage. Beta(a,b) expresses uncertainty on a [0,1] bounded quantity.

Suppose you’ve seen s true instances out of estimate the true proportion as p=s/n . The n observations, with no further information. You’d uncertainty in this estimate can be modeled as: Beta(s+1,n-s+1) Exercise: Of 100 sampled voters, 55 supported Candidate A. Model the uncertainty on the true proportion.

Copyright © 2010 Lumina Decision Systems, Inc.

Bounded Distributions

• • • • Triangular(min,mode,max) Often very convenient & natural for expressing estimates when only the range and a best guess are available.

Pert(min,mode,max) Same idea as Triangular. To use, include “Distribution Variations.ana” Uniform(min,max) All values between are equally likely.

Uniform(min,max,integer:true) All integer values are equally likely.

Copyright © 2010 Lumina Decision Systems, Inc.

Bounded comparisons

• • • Using: Min = 10 Mode = 25 Max = 40 Compare distributions (on same PDF & CDF plot): Triangular Pert Uniform Repeat for Mode=15 Copyright © 2010 Lumina Decision Systems, Inc.

Central Limit Theorem

• • Suppose y = x 1 ·x 2 ·x 3 · .. ·x N z = x 1 +x 2 +x 3 + .. +x N Each x i ~ P(·), where P(·) is any distribution. (each x i is independent) Then as N → ∞, y → LogNormal(..) z →Normal(..) Copyright © 2010 Lumina Decision Systems, Inc.

Sensitivity to Distribution Choice

• • • Load the TXC model (Example Models – Risk Analysis) Compare Total_cost for these Control_cost_factor distributions: LogNormal(mean:108.6M,stddev:45.96M) Gamma(5.58,19.45M) Uniform(29M,188M) Triangular(41M,60M,245M) Weibull(2.53,122.4M) Using the LogNormal: Compare Total_cost when Control_cost_factor mean is increased or decreased by 10%.

Compare when stddev is altered by 50% Copyright © 2010 Lumina Decision Systems, Inc.

Summary

• • • Various parametric distributions are convenient for certain type of quantities.

Choice of parametric distribution is usually driven by: Continuous vs. discrete Tails or bounded Broad shape Type of information easily estimated Results are usually fairly insensitive to exact choice of distribution type.

Copyright © 2010 Lumina Decision Systems, Inc.