Transcript Document

Moment propagation
Scott Ferson, [email protected]
11 September 2007, Stony Brook University, MAR 550, Challenger 165
Outline of Moment Propagation
Delta method
Intervals (worst case analysis)
• Easy to understand and calculate with
• Often good enough to make a decision
• Appropriate for use with even the worst data
• Results often too wide to be practically useful
• Don’t say anthing about tail risks
Moments (delta method)
• Easy to compute
• More precise than is justfied
What to do?
• Solution is to marry intervals and moments
– Intervals can be tighter if we use moment information
– Bounding moments would tell us about tails
What do moments say about risks?
1
Exceedance risk
If we know the mean is 10
and the variance is 2, these
are best possible bounds
on the chance the variable
is bigger than any value
(Chebyshev inequality).
0
-10
0
10
20
30
Moment propagation
Mean
Variance
k+X
EX + k
VX
kX
k EX
k2 VX
exp(X)
rowe(exp)
rowevar(exp)
ln(X) , 0<X
rowe(ln)
rowevar(ln)
log10(X), 0<X rowe(log10)
rowevar(log10)
1/X, X¹0
rowe(reciprocal)
rowevar(reciprocal)
X2
EX2 + VX
rowevar(square)
sqrt(X), 0X
rowe(sqrt)
rowevar(sqrt)
X+Y
EX + EY
(VX ± VY)2
XY
EX  EY
(VX ± VY)2
XY
EX EY ± (VX VY )
Goodman formula
X  Y, Y0
E(X  (1/Y))
V(X  (1/Y))
XY, 1X
E(exp(ln(X)´Y))
V(exp(ln(X)´Y))
where EZ and VZ are the mean and variance of the random variable Z
Range propagation (interval analysis)
k+X
kX
exp(X)
ln(X) , 0<X
log10(X) , 0<X
1/X, X  0
X2
sqrt(X), 0  X
|X|
X+Y
XY
XY
X  Y, Y  0
XY , 1  X
min(X, Y)
max(X, Y)
Least possible value
Greatest possible value
k + LX
k + GX
 k LX , if 0  k
 k GX , if k < 0
 k GX , if 0  k
 k LX , if k < 0
exp(LX)
exp(GX)
ln(LX)
ln(GX)
log10(LX)
log10(GX)
1/GX
1/LX
 0, if LX  0  GX
max((LX)2,(GX)2)
 min((LX)2,(GX)2), else
sqrt(LX)
sqrt(GX)
 0, if LX  0  GX
max(|LX|,|GX|)
 min(|LX|,|GX|), else
LX + LY
GX + GY
LX  GY
GX  LY
min(LX LY, LX GY, GX LY, GX GY) max(LXLY,LXGY,GXLY,GXGY)
L(X  1/Y)
G(X  1/Y)
min(LXLY, GXGY, LXGY, GXLY)
max(LXLY, GXGY, LXGY, GXLY)
min(LX, LY)
min(GX, GY)
max(LX, LY)
max(GX, GY)
where LZ and GZ are the leaster and greatest possible values of Z
Intervals about moments
• Even if we can’t say what the distributions and
dependencies are, we can project the means and
variances through calculations.
• If we know the variables are independent, then the
projections will be tighter.
• This can be combined with propagation of the
ranges as well.
Probability (x < X)
Range and moments together
1
0
LX
EX
VX
GX
Probability (x < X)
Interpreting a p-box
1
0
{min = 0, max = 100, mean = 50, stdev = s}
s = 1
s = 5
s = 10
s = 15
s = 20
s = 25
s = 30
s = 35
s = 40
s = 45
s = 49
s = 50
{min = 0, max = 100, mean = 10, stdev = s}
s = 1
s = 2
s = 3
s = 4
s = 5
s = 6
s = 8
s = 10
s = 15
s = 20
s = 25
s = 29
Interval bounds on moments
• What if we don’t know the variance? Mean?
Travel time (Lobascio)

n  BD  foc  Koc L
T
K i
Parameter
L source-receptor distance
i
hydraulic gradient
K hydraulic conductivity
n
effective soil porosity
BD soil bulk density
foc fraction organic carbon
Koc organic partition coefficient
Units
m
m/m
m/yr
kg/m3
m3/kg
Min
80
0.0003
300
0.2
1500
0.0001
5
Max
120
0.0008
3000
0.35
1750
0.005
20
Mean
100
0.00055
1000
0.25
1650
0.00255
10
Stdv
11.55
0.0001443
750
0.05
100
0.001415
3
Shape
uniform
uniform
lognorm
lognorm
lognorm
uniform
normal
Inputs as mmms p-boxes
1
1
L
0
70
90 110 130
m
K
0
0.0003 0.0006 0.0009
0
1
1
BD
1600
kg m-3
1
i
1
0
1400
1
0
foc
1800
0
0
0.002 0.004
n
2000
m yr-1
4000
Koc
0
0
10 20
m3 kg-1
30
0
0.2
0.3
0.4
1
Tind [yr]
0.5
0
0
100000
Cumulative probability
Cumulative probability
Quantitative results
1
0.8
0.8
0.6
0.6
relax independence
assumptions
0.4
0.4
0.2
0.2
original model
0
0
500
1000
1500
Traveling
time
(years)
Traveling
time
(years)
2000
Is independence reasonable?
•
•
•
•
Soil porosity and soil bulk density?
Hydraulic conductivity and soil porosity?
Hydraulic gradient and hydraulic conductivity?
Fraction organic carbon and organic partition
coefficient?
• You’re the groundwater modelers; you tell us
• Remember: independence is a much stronger
assumption than uncorrelatedness
Assumptions no longer needed
• A decade ago, you had to assume all variables
were mutually independent
• Software tools now allow us to relax any pesky
independence assumption
• No longer necessary to make independence
assumptions for mathematical convenience
• But do the assumptions make any difference?
1
Tdep [yr]
0.5
0
0
100000
Cumulative probability
Cumulative probability
Quantitative results
1
0.8
0.8
0.6
0.6
relax independence
assumptions
0.4
0.4
0.2
0.2
original model
0
0
500
1000
1500
Traveling
time
(years)
Traveling
time
(years)
2000
Dependence bounds
• Guaranteed to enclose results no matter
what correlation or dependence there may
be between the variables
• Best possible (couldn’t be any tighter
without saying more about the dependence)
• Can be combined with independence
assumptions between other variables
Conclusions
• The model is a cartoon, but it illustrates the
use of methods to relax independence and
precise distribution assumptions
• Relaxing these assumptions can have a big
impact on quantitative conclusions from an
assessment
Take-home message
• Whatever assumption about dependencies
and the shape of distributions is between
you and your spreadsheet
• There are methods now available that don’t
force you to make assumptions you’re not
comfortable with
Acknowledgments
• Srikanta Mishra
• Neil Blandford
• William Oberkampf
• Sandia National Laboratories
• National Cancer Institute
• National Institute of Environmental Health
Sciences
More information
• Website: http://www.ramas.com/riskcalc.htm
• Email: [email protected], [email protected]
• Paper: Ferson, S. 1996. What Monte Carlo methods
cannot do. Human and Ecological Risk Assessment
2: 990–1007.
• Software/book: Ferson, S. 2002. RAMAS Risk Calc
4.0 Software: Risk Assessment with Uncertain
Numbers. Lewis Publishers, Boca Raton, Florida.
[31.6, 233800] years
• Is ‘6’ the last decimal digit of the lower bound?
• Did you check that the units balance?
• Did you include units in the answer?
How to understand this result
• Highly reliable result, given the assumptions
– Can’t get any worse
• Represents parametric uncertainty
– Neglects (possibly big) model uncertainty
• Expresses only best and worst cases
– How likely is 32 years? 50 years? 100 years?
Lobascio’s original formulation
Kd=
R =
V =
T =
foc  Koc = [ 0.0005, 0.1] m3 kg-1
1 + BD  Kd / n = [ 3.143, 876]
K  i / (n  R) = [ 0.000293, 3.82] m yr-1
L/V = [ 20.95, 408800] yr
Quickest plume reaches the well = 20.95 yr
Longest plume reaches the well = 408,800 yr
What explains the difference?
(hint: n is repeated above)
Repeated parameters
a = [1,2]
b = [2,3]
c = [2, 5]
z = a × (b + c)
zz = a × b + a × c
b + c = [0, 8]
z = [0, 16]
a × b = [2, 6]
a × c = [4, 10]
zz = [2, 16]
inflated uncertainty
What to do about repeated parameters
• Always rigorous, but maybe not best possible
when uncertain parameters are repeated
• Inconsequential if all are non-negative and all
operations are increasing (+, ×, but not – or ÷)
• Use cancellation to reduce repetitions, e.g.,
caia/m + cwiw/m + cdid/m = (caia + cwiw + cdid)/m
• Cancellation not always possible, e.g.,
(a + b) / (a + c) = ??
If you can’t cancel
• Use tricks with algebra
e.g., a² + a = (a +½)² – ¼
• Employ subinterval reconstitution
A brute-force and computationally intensive strategy
Workable if there aren’t too many repeated parameters
• Live with the suboptimality
Decisions may not require perfect precision
Tricks
Two repetitions
One repetition
u + v – uv = 1 – (1 – u) (1 – v)
u + u = 2u
(u + v) / (1 – uv) = tan(atan(u) + atan(v))
u–u=0
(u – v) / (1 + uv) = tan(atan(u) – atan(v))
u  u = u2
(1 + uv) / (u – v) = 1 / tan(atan(u) – atan(v))
u/u=1
(1 – uv) / (u + v) = 1 / tan(atan(u) + atan(v))
(1+u) / u = 1/u + 1
(1+u)/(1–u) = (1/tan(acos(u)/2))2 (uv – 1) / (u + v) = –1 / tan(atan(u) + atan(v))
u sqrt(1 – v2) + v sqrt(1 – u2) = sin(asin(u) + asin(v))
au + bu = u(a + b)
u sqrt(1 – v2) – v sqrt(1 – u2) = sin(asin(u) – asin(v))
au – bu = u(a – b)
u v + sqrt(1 – u2) sqrt(1 – v2) = cos(acos(u) – acos(v))
a/u + b/u = (a + b) / u
u v – sqrt((1 – u2) (1 – v2)) = cos(acos(u) + acos(v))
a/u – b/u = (a – b) / u
u v – sqrt(1 – u2 – v2 + u2 v2) = cos(acos(u) + acos(v))
u/a + u/b = u(b + a)/(ab)
sin(u) sqrt(1 – sin(v)2) + sin(v) sqrt(1 – sin(u)2) = sin(u + v)
u/a – u/b = u(b – a)/(ab)
cos(u) cos(v) – sin(u) sin(v) = cos(u + v)
aub + cub = (a + c) ub
sin(u) cos(v) – cos(u) sin(v) = sin(u – v)
aub cud = a c u(b + d)
sqrt((1 + u) / (1 – u)) = 1 / tan(acos(u)/2)
au bu = exp(u (ln(a) + ln(b)))
etc.
u2 + u = (u + ½)2 – ¼
u2 – u = –¼ sin(2 asin(sqrt(u)))2
u2 + au = (u + a/(2))2 – a2/4
u, v, etc. represent the uncertain numbers
etc.
a, b, etc. represent arbitrary expressions
Basic identities
u+0=u
u–0=u
0 – u = –u
u0=0
u1=u
u/1=u
u0 = 1
u1 = u
u&1=u
u|1=1
u&0=0
u|0=u
u&u=u
u|u=u
u & not(u) = 0
u | not(u) = 1
(u&a) | (u&b) = u&(a | b)
(u | a)&(u | b) = u | (a&b)
etc.
Subinterval reconstitution
• Partition each repeated interval into subintervals
• Compute the function for every subinterval
• The union of all the results contains the true range
f (u, v,..., w, x, y,...z )  ... f (ui , v j ,..., wk , x, y,..., z )
i
j
k
where u, v, …, w are repeated intervals and
x, y,…, z are other interval and scalar inputs, and
u   ui ; v   v j ; ...; w   wk
i
j
k
Example: (a + b)a, a = [0.1, 1], b = [0,1]
m
U(ai+b)ai
1
[ 0.1, 2]
2
[ 0.282, 2]
3
[ 0.398, 2]
4
[ 0.473, 2]
5
[ 0.525, 2]
10
[ 0.624, 2]
100 [ 0.686, 2]
1,000 [ 0.692, 2]
10,000 [ 0.692, 2]
Partition the repeated uncertain a:
ai = [(i  1)w/m + a, iw/m + a]
where i = 1,2,…, m, and m is the
number of subintervals, w is the
width of a, and a is its lower bound
a
0
0.5
1
Cauchy-deviate method
(Trejo and Kreinovich 2001)
• Propagates intervals through black-box model
– Don’t need to know, but have to be able to query it
• “Sample” from around interval
– Points not necessarily inside the interval!
• Scale results to get an asymptotically correct
estimate of the interval uncertainty of the output
Cauchy-deviate method
• Depends on the number of samples, not inputs
– Works just as well for 2000 variables as 20
– Similar in performance to Monte Carlo
• Need about 200 samples to obtain 20% relative
accuracy of half-width of output range
– With fewer samples, we’d get lower accuracy, but
we can compensate for this by scaling by N,
which works under the linearity assumption
Limitations of the method
• Asymptotically correct, but not rigorous
• Intervals narrow relative to the nonlinearity
– Function almost linear OR uncertainties small
– Could combine with subinterval reconstitution
• Most efficient when dimensionality is high
• Only handles interval uncertainty
Computing
• Sequence of binary operations
– Need to deduce dependencies of intermediate
results with each other and the original inputs
– Different calculation order can give different
results (which should be intersected)
• Do all at once in one multivariate calculation
– Can be much more difficult computationally
– Can produce much better tightening
Specifying input intervals
Interval uncertainty
• Statisticians often ignore this uncertainty
• “Interval uncertainty doesn’t exist in real life”
(Tony O’Hagan et al.)
Hammer salesmen saying screws don’t exist?
When do intervals arise?
• Periodic observations
When did the fish in my aquarium die during the night?
• Plus-or-minus measurement uncertainties
Coarse measurements, measurements from digital readouts
• Non-detects and data censoring
Chemical detection limits, studies prematurely terminated
• Privacy requirements
Epidemiological or medical information, census data
• Theoretical constraints
Concentrations, solubilities, probabilities, survival rates
• Bounding studies
Presumed or hypothetical limits in what-if calculations
Ways to characterize intervals
•
•
•
•
•
Theoretical constraints
Modeled from other intervals
Expert assertions
Discounting (widening) intervals (Shlyakhter)
Confidence procedures (Grosof)
– But 95% confidence isn’t the same as surety
– Use in interval calculations requires an assumption
Problems with confidence intervals
• Cannot be combined in arithmetic or logical
operations without an assumption
• Don’t measure epistemic belief anyway
Example (Walley): For instance, a 95% confidence
interval could have zero chance of containing the
value. For example, suppose X ~ normal(, 1),
where 0 < . If the sample mean happens by chance
to be 21.3, the 95% confidence interval on the
mean is the empty set.
Why we have to be careful
• Interval analysis yields contingent results
• Results are contingent on assumptions that model inputs are
within their respective intervals
• But all analysis results are contingent on similar assumptions
that the models they came from are true
• Naïve elicitation has big problems
• Intervals are usually unrealistically narrow
• People make incoherent statements
• Can’t mix together different kinds
• Not clear how to translate data into intervals
Determining endpoints
• The largest observed may not be the largest possible
(and it usually isn’t)
• Sampling theory >> theory of extremes
• Rigor of analysis is contingent on inputs
• If you’re nervous, just widen the bounds
Point sample data
Distribution
Support
Range (envelope)
Percentile range
Extreme value model
Prediction interval
P-box
Tolerance interval
Support
cut
Confidence interval
Credibility interval
Central value and
width
Level cut
Percentile range
Model
Plus-minus interval
simulation, etc.
Output range
Cauchy deviates
Trejo-Kreinovich
Shlyakhter widening
Certain and
tenable ranges
Interval function
Intersection
Envelope
backcalculation
Tolerance solution
Eliciting dependence
• As hard as getting intervals (maybe a bit worse)
• Theoretical or “physics-based” arguments
• Inference from empirical data
– Risk of loss of rigor at this step (just as there is
when we try to infer intervals from data)
Updating
Aggregation (updating)
• How do you combine different sources?
• If you trust them all, take the intersection
– [max(x1, y1, z1, …), min(x2, y2, z2, …)]
– What if there is no intersection (right<left)?
• If you’re not sure which is right, use the envelope
– [min(x1, y1, z1, …), max(x2, y2, z2, …)]
– But are you sure this is wide enough?
Example
• Suppose we have two rigorous interval
estimates of the same quantity: [1,7] & [4,10]
B
A
0
5
10
• Their intersection [4,7] is also a rigorous
interval for the quantity
Constraint analysis (updating)
• Using knowledge of how variables are related
to tighten their estimates
• Removes internal inconsistency and explicates
unrecognized knowledge
• Also called ‘constraint updating’ or ‘editing’
• Also called ‘natural extension’
Example
• Suppose we know
W = [23, 33] m
H = [112, 150] m
A = [2000, 3200] m2
• Does knowing WH=A let us to say any more?
Answer
• Yes! We can infer that
W = [23, 28.57]
H = [112, 139.13]
A = [2576, 3200]
• The formulas are just W = intersect(W, A/H), etc.
To get the largest possible W, for instance, let A be as large
as possible and H as small as possible, and solve for W =A/H.
Updating with p-boxes
1
0
20
W
30
H
1
40
0
120
140
1
0
160 2000
A
3000
4000
Answers
1
0
20
W
30
intersect(W, A/H)
H
1
40
0
120
140
A
1
0
160 2000
intersect(H, A/W)
3000
4000
intersect(A, WH)
Bayesian strategy
Prior
I (W  [23,33]) I ( H  [112 ,150 ]) I ( A  [2000 ,3200 ])
Pr(W , H , A) 


33  23
150  112
3200  2000
Likelihood
L( A  W  H | W , H , A)   ( A  W  H )
Posterior
f (W , H , A | A  W  H )   ( A  W  H )  Pr(W , H , A)
Bayes’ rule
• Concentrates mass onto the manifold of
feasible combinations of W, H, and A
• Answers have the same supports as intervals
• Computationally complex
• Needs specification of priors
• Yields distributions that are not justified
(coming from the choice of priors)
• Expresses less uncertainty than is present
Backcalculation
Backcalculation
• Needed for cleanup and remediation planning
• Untangles an equation in uncertain numbers
when we know all but one of the variables
• For instance, backcalculation finds B such
that A+B = C, from estimates for A and C
Can’t just invert the equation
prescribed
unknown
known
Dose = Concentration × Intake
Concentration = Dose / Intake
When concentration is put back into the forward
equation, the resulting dose is wider than planned
Example
dose = [0, 2] milligram per kilogram
intake = [1, 2.5] liter
mass = [60, 96] kilogram
conc = dose * mass / intake
[ 0, 192] milligram liter-1
dose = conc * intake / mass
[ 0, 8] milligram kilogram-1
Doses four times larger than tolerable levels we planned
Untangling backcalculations
• Solving for B given A + B = C
B = backcalc(A, C) = [C1  A1, C2  A2]
• Solving for B given A  B = C
B = factor(A, C) = [C1 / A1, C2 / A2]
• Sometimes called “tolerance solutions”
Kernal versus shell
Given A  [1,2]
C  [2,6]
CAB
There are two different ways to solve for B
6
5
Kernel (tolerance solution)
B  backcalc(A,C)  [1,4]
kernel
shell
4
B
Shell (united solution)
B  C  A  [0,5]
3
2
1
1
1.5
A
2
When you
need for
A+BC
A–BC
ABC
A /BC
A^BC
2A  C
A²  C
And you have
estimates for
A, B
A, C
B ,C
A, B
A, C
B ,C
A, B
A, C
B ,C
A, B
A, C
B ,C
A, B
A, C
B ,C
A
C
A
C
Use this formula
to find the unknown
C=A+B
B = backcalc(A,C)
A = backcalc (B,C)
C=A–B
B = –backcalc(A,C)
A = backcalc(–B,C)
C=A*B
B = factor(A,C)
A = factor(B,C)
C=A/B
B = 1/factor(A,C)
A = factor(1/B,C)
C=A^B
B = factor(log A, log C)
A = exp(factor(B, log C))
C=2*A
A=C/2
C=A^2
A = sqrt(C)
Interval algebra
• Commutativity a+b=b+a, a×b=b×a
• Associativity (a+b)+c=a+(b+c), (a×b)×c=a×(b×c)
• Neutral elements a+0=0+a=a, a×1=1×a=a
• Subdistributivity a×(b+c)  a×b+a×c
• Subcancellation a  a+bb, a  a×b/b
• No inverse elements a+( a)  0, a×(1/a)  1
Conclusions
• Interval analysis is a worst case analysis (that also
includes the best case)
• Repeated uncertain parameters can cause
unnecessary inflation of uncertainy
• Results will always be rigorous, but might not be
best possible
• Moving a uncertain parameter to the other side of
an equal sign often requires backcalculation
Exercises
1. Do the inputs in the travel time example seem dependent?
2. What does subinterval reconstitution with m=100 on the
original Lobascio formulation give for the travel time?
3. What contaminant concentrations C in water will lead to
doses D no larger than 6 mg per kg per day if it comes from
both drinking and eating fish as
D = (Iwater C) / W + (Ifish B  C) / W, where
Iwater = [1.5, 2.5] liters per day
Ifish = [0, 8] g per day
B = [0.9, 2.1] liters per mg
W = [60, 90] kg
// water intake
// dietary ingestion of fish
// bioaccumulation factor
// receptor biomass
How do you check the solution?
4. Is there a Bayesian analog of backcalculation?
Conclusions
• Easy to compute rigorous bounds
• Mathematical programming may be needed
to get answers that are also best possible
• Rigor of analysis is contingent on inputs
• If you’re nervous, just widen the bounds
Exercises
1.
2.
3.
4.
Calculate the probability of tank rupture under pumping
that assumes the interval inputs and makes no assumption
about the dependencies among the events.
Develop an fault tree for establishment of snake
populations on a Hawaiian island (or a star exploding).
Compute the probability of the conjunction of two events
having probabilities 0.29 and 0.22, assuming a Pearson
correlation of 1.0. Compare the result to the Fréchet
range for such probabilities. What’s going on?
Derive an algorithm to compute the probability that n of k
events occur, given intervals for the probability of each
event, assuming they’re independent. Derive an
analogous algorithm for the Fréchet case.
Rigorousness
• The computations are guaranteed to enclose
the true results (so long as the inputs do)
• “Automatically verified calculations”
• You can still be wrong, but the method
won’t be the reason if you are
Conclusions
Why bounding?
•
•
•
•
Often sufficient to specify a decision
Possible even when estimates are impossible
Usually easy to compute and simple to combine
Rigorous, rather than an approximation
(after N.C. Rowe 1988)
Reasons to use interval analysis
•
•
•
•
•
•
•
•
Requires very little data
Applicable to all kinds of uncertainty
Can be comprehensive
Fast and easy to compute answers
Conservative when correlations unknown
Can be made “best possible”
Backcalculations easy
Updating relatively easy
Reasons not to use it
•
•
•
•
•
•
Same thing as worst case analysis
Doesn't say how likely extreme event is
Repeated parameters are cumbersome
Not optimal when there’s a lot of data
Can't use distribution information
Can't use correlation information
Interval (worst case ) analysis
How?
– bound inputs, a = [a1, a2], where a1  a2
– addition:
[a1, a2] + [b1, b2] = [a1+b1, a2+b2]
– subtraction:
[a1, a2] – [b1, b2] = [a1–b2, a2–b1]
– multiplication, division, etc. are a little more complex
Why?
– natural for scientists and easy to explain to others
– works no matter where uncertainty comes from
Why not?
– paradoxical: can’t give exact value but can give exact bounds
– ranges could grow quickly, yielding very wide results
– doesn’t give probabilities of extreme outcomes (tail risks)
Interval probability
How?
– bound event probabilities, p = [p1, p2], where 0  p1  p2  1
– evaluate event trees as composition of ANDs, ORs, etc.
– standard probabilistic rules if events are independent
– Fréchet rules if their dependence is unknown
– other dependency relations can also be represented
Why?
– can capture incertitude about event probabilities
Why not?
– paradoxical: can’t give exact value but can give exact bounds
– ranges can grow quickly, especially without independence
References
Dwyer, P. 1951. Linear Computations. John Wiley & Sons, New York.
Ferson, S. 2002. RAMAS Risk Calc 4.0: Risk Assessment with Uncertain Numbers. Lewis Publishers, Boca Raton.
Grosof, B.N. 1986. An inequality paradigm for probabilistic knowledge: the logic of conditional probability intervals.
Uncertainty in Artificial Intelligence. L.N. Kanal and J.F. Lemmer (eds.), Elsevier Science Publishers, Amsterdam.
Hailperin, T. 1986. Boole’s Logic and Probability. North-Holland, Amsterdam.
Kyburg, H.E., Jr. 1998. “Interval Valued Probabilities,” http://ippserv.rug.ac.be/documentation/interval_prob/interval_prob.html,
The Imprecise Probabilities Project, edited by G. de Cooman and P. Walley, http://ippserv.rug.ac.be/home/ipp.html.
Lobascio, M.C. 1993. Uncertainty analysis tools for environmental modeling: application of Crystal Ball® to predict
groundwater plume traveling times. ENVIRONews 1: 6-10.
Loui, R.P. 1986. Interval based decisions for reasoning systems. Uncertainty in Artificial Intelligence. L.N. Kanal and
J.F. Lemmer (eds.), Elsevier Science Publishers, Amsterdam.
Moore, R.E. 1966. Interval Analysis. Prentice-Hall, Englewood Cliffs, New Jersey.
Moore, R. 1979. Methods and Applications of Interval Analysis. SIAM, Philadelphia.
Rowe, N.C. 1988. Absolute bounds on the mean and standard deviation of transformed data for constant-sign-derivative
transformations. SIAM Journal of Scientific Statistical Computing 9: 1098–1113.
Shlyakhter A. 1994. Improved framework for uncertainty analysis: accounting for unsuspected errors. Risk Analysis
14(4):441-447.
Tessem, B. 1992. Interval probability propagation. International Journal of Approximate Reasoning 7: 95-120.
Trejo, R. and V. Kreinovich. 2001. Error estimations for indirect measurements: randomized vs. deterministic algorithms
for ‘black-box’ programs. Handbook on Randomized Computing, S. Rajasekaran, P. Pardalos, J. Reif, and J. Rolim
(eds.), Kluwer, 673–729. http://www.cs.utep.edu/vladik/2000/tr00-17.pdf
Vesely, W.E., F.F. Goldberg, N.H. Roberts, D.F. Haasl. 1981. Fault Tree Handbook. Nuclear Regulatory Commission,
Washington, DC.
Vick, S.G. 2002. Degrees of Belief: Subjective Probability and Engineering Judgment. ASCE Press, Reston, Virginia.
End
Software
•
•
•
•
RAMAS Risk Calc 4.0 (NIH, commercial)
GlobSol (Baker Kerfoot)
WIC (NIH, freeware)
Interval Solver (<<>>)
Web presentations and documents
Interval computations home page
Uncertainty about distributions
1
1
1
L
i
0
70
90
110
m
130
1
K
0
0.0003
0.0007
0
0
foc
4000
Koc
0
1800
2000
m/yr
1
BD
1600
3
kg/m
n
0
1
0
1400
1
0
0
0.002
0.004
0
10
m3/kg
20
0.2
0.3
0.4
Distribution uncertainty
• Could be much bigger
• Could be smaller (could be zero)
• Could be mixed for different variables
• Could be parametric
• Could be uncertainty about the shape
• Could arise from sampling information
Cumulative probability
Dependence and distribution
1
relax both dependence and
0.8
0.8 distribution assumptions
0.6
0.6
relax
dependence
0.4
0.4
0.2
0.2
original model
0
0
500
1000
1500
Traveling
time
(years)
Traveling
time
(years)
2000
Uncertainty about distributions
1
1
1
L
i
0
70
90
110
m
130
1
K
0
0.0003
0.0007
0
0
foc
4000
Koc
0
1800
2000
m/yr
1
BD
1600
3
kg/m
n
0
1
0
1400
1
0
0
0.002
0.004
0
10
m3/kg
20
0.2
0.3
0.4
Probability bounds
• Guaranteed to enclose results no matter the
distribution (so long as it’s inside the
probability box)
• In many cases, the results are best possible
(can’t be tighter without more information)
• Can be combined with precise distributions