Where do the inputs come from?

Download Report

Transcript Where do the inputs come from?

Untangling equations
involving uncertainty
Scott Ferson, Applied Biomathematics
Vladik Kreinovich, University of Texas at El Paso
W. Troy Tucker, Applied Biomathematics
Overview
• Three kinds of operations
– Deconvolutions
– Backcalculations
– Updates (oh, my!)
• Very elementary methods of interval analysis
– Low-dimensional
– Simple arithmetic operations
• But combined with probability theory
Probability box (p-box)
Bounds on a cumulative distribution function (CDF)
Envelope of a Dempster-Shafer structure
Used in risk analysis and uncertainty arithmetic
Generalizes probability distributions and intervals
Cumulative probability
•
•
•
•
1
1
1
0.5
0.5
0.5
0
0
0
10
20
30
40
0
10
20
30
40
10
20
30
This is an interval, not
a uniform distribution
Probability bounds analysis (PBA)
1
1
1
assuming
independence
1
CDF
0
0
assuming
independence
1
0
0
20
0
0
40
0
80
0
80
assuming
nothing
0
0
80
PBA handles common problems
•
•
•
•
•
Imprecisely specified distributions
Poorly known or unknown dependencies
Non-negligible measurement error
Inconsistency in the quality of input data
Model uncertainty and non-stationarity
• Plus, it’s much faster than Monte Carlo
Updating
• Using knowledge of how variables are
related to tighten their estimates
• Removes internal inconsistency and
explicates unrecognized knowledge
• Also called constraint updating or editing
• Also called natural extension
Example
• Suppose
W = [23, 33]
H = [112, 150]
A = [2000, 3200]
• Does knowing WH=A let us to say any more?
Answer
• Yes, we can infer that
W = [23, 28.57]
H = [112, 139.13]
A = [2576, 3200]
• The formulas are just W = intersect(W, A/H), etc.
To get the largest possible W, for instance, let A be as large
as possible and H as small as possible, and solve for W =A/H.
Bayesian strategy
Prior
I (W  [23,33]) I ( H  [112 ,150 ]) I ( A  [2000 ,3200 ])
Pr(W , H , A) 


33  23
150  112
3200  2000
Likelihood
L( A  W  H | W , H , A)   ( A  W  H )
Posterior
f (W , H , A | A  W  H )   ( A  W  H )  Pr(W , H , A)
Bayes’ rule
• Concentrates mass onto the manifold of
feasible combinations of W, H, and A
• Answers have the same supports as intervals
• Computationally complex
• Needs specification of priors
• Yields distributions that are not justified
(come from the choice of priors)
• Expresses less uncertainty than is present
Updating with p-boxes
1
1
1
0
20
30
A
H
W
40
0
120
140
0
160 2000
3000
4000
Answers
1
1
1
0
20
30
intersect(W, A/H)
A
H
W
40
0
120
140
0
160 2000
intersect(H, A/W)
3000
4000
intersect(A, WH)
Calculation with p-boxes
• Agrees with interval analysis whenever
inputs are intervals
• Relaxes Bayesian strategy when precise
priors are not warranted
• Produces more reasonable answers when
priors not well known
• Much easier to compute than Bayes’ rule
Backcalculation
• Find constraints on B that ensure C = A + B
satisfies specified constraints
• Or, more generally, C = f(A1, A2,…, Ak, B)
• If A and C are intervals, the answer is called
the tolerance solution
Can’t just invert the equation
conc  intake
dose =
body mass
dose  body mass
conc =
intake
When conc is put back into the forward equation,
the dose is wider than planned
Example
dose = [0, 2] milligram per kilogram
intake = [1, 2.5] liter
mass = [60, 96] kilogram
conc = dose * mass / intake
[ 0, 192] milligram liter-1
dose = conc * intake / mass
[ 0, 8] milligram kilogram-1
Doses 4 times larger
than tolerable levels!
Backcalculating probability distributions
• Needed for engineering design problems,
e.g., cleanup and remediation planning for
environmental contamination
• Available analytical algorithms are unstable
for almost all problems
• Except in a few special cases, Monte Carlo
simulation cannot compute backcalculations;
trial and error methods are required
Backcalculation with p-boxes
Suppose A + B = C, where
A = normal(5, 1)
C = {0  C, median  15, 90th %ile  35, max  50}
1
1
02
A
3
4
5
C
6
7
8
0
-10 0 10 20 30 40 50 60
Getting the answer
• The backcalculation algorithm basically
reverses the forward convolution
• Not hard at all…but a little messy to show
• Any distribution
1
totally inside B is
sure to satisfy the
constraint … it’s
“kernel”
B
0-10 0
10 20 30 40 50
Check by plugging back in
A + B = C*  C
1
C*
0
-10
0
10
20
C
30
40
50
60
When you
Know that
A+B=C
A–B=C
AB=C
A /B=C
A^B=C
2A = C
A² = C
And you have
estimates for
A, B
A, C
B ,C
A, B
A, C
B ,C
A, B
A, C
B ,C
A, B
A, C
B ,C
A, B
A, C
B ,C
A
C
A
C
Use this formula
to find the unknown
C=A+B
B = backcalc(A,C)
A = backcalc (B,C)
C=A–B
B = –backcalc(A,C)
A = backcalc (–B,C)
C=A*B
B = factor(A,C)
A = factor(B,C)
C=A/B
B = 1/factor(A,C)
A = factor(1/B,C)
C=A^B
B = factor(log A, log C)
A = exp(factor(B, log C))
C=2*A
A=C/2
C=A^2
A = sqrt(C)
Kernels
• Existence more likely if p-boxes are fat
• Wider if we can also assume independence
• Answers are not unique, even though
tolerance solutions always are
• Different kernels can emphasize different
properties
• Envelope of all possible kernels is the shell
(i.e., the united solution)
Precise distributions
• Precise distributions can’t express the nature
of the target
• Finding a conc distribution that results in a
prescribed distribution of doses says we want
some doses to be high (any distribution to the
left would be even better)
• We need to express the dose target as a p-box
Deconvolution
• Uses information about dependence to
tighten estimates
• Useful, for instance, in correcting an
estimated distribution for measurement
uncertainty
• For instance, suppose Y = X + 
• If X and  are independent, Y² = X² + ²
• Then we do an uncertainty correction
Example
• Y=X+
• Y,  ~ normal
• X ~ N(decon(Y, X), sqrt(decon(², Y²))
• Y ~ N([5,9], [2,3]);  ~ N([1,+1], [½,1])
• X ~ N(dcn([1,1],[5,6]), sqrt(dcn([¼,1],[4,9])))
• X ~ N([6,8], sqrt([3, 63])
Deconvolutions with p-boxes
• As for backcalculations, computation of
deconvolutions is troublesome in
probability theory, but often much simpler
with p-boxes
• Deconvolution didn’t have an analog in
interval analysis (until now via p-boxes)
Relaxing over-determination
• Most constraint problems almost never have
solutions with probability distributions
• The constraints are too numerous and strict
• P-boxes relax these constraints so that many
problems can have solutions
P-boxes in interval analysis
• P-boxes bring probability distributions into the realm
of intervals
• Express and solve backcalculation problems better
than is possible in probability theory by itself
• Generalize the notion of tolerance solutions (kernels)
• Relax unwarranted assumptions about priors in
updating problems needed in a Bayesian approach
• Introduce deconvolution into interval analysis
Acknowledgments
• Janos Hajagos, Stony Brook University
• Lev Ginzburg, Stony Brook University
• David Myers, Applied Biomathematics
• National Institutes of Health SBIR program
End
1
1
0
1
W
H
20
30
0
40 110 120 130 140 150 160 02000