Bayesian Methods

Transcript Bayesian Methods

Bayesian Methods
What they are and how they fit into
Forensic Science
3
2
1
0
1
2
3
Outline
• Bayes’ Rule
• Bayesian Statistics (Briefly!)
• Conjugates
• General parametric using BUGS/MC software
• Bayesian Networks
• Some software: GeNIe, SamIam, Hugin, gR R-packages
• Bayesian Hypothesis Testing
• The “Bayesian Framework” in Forensic Science
• Likelihood Ratios with Bayesian Network software
A little about conditional probability
Pr(A Ç B)
Pr(A | B) =
Pr(B)
Pr(B Ç A)
Pr(B | A) =
Pr(A)
Pr(A)
Pr(A Ç B) = Pr(B Ç A)
Bayes’ Rule:
Pr(B)
Pr(B | A)
Pr(A | B) =
Pr(A)
Pr(B)
Probability
• Frequency: ratio of the number of
observations of interest (ni) to the total
number of observations (N)
ni
frequency of observation i =
N
•
It is EMPIRICAL!
•
Probability (frequentist): frequency of
observation i in the limit of a very large number
of observations
Probability
• Belief: A “Bayesian’s” interpretation of
probability.
•
An observation (outcome, event) is a “measure of
the state of knowlege”Jaynes.
•
•
Bayesian-probabilities reflect degree of belief and can
be assigned to any statement
Beliefs (probabilities) can be updated in light of
new evidence (data) via Bayes theorem.
Bayesian Statistics
• The basic Bayesian philosophy:
Prior Knowledge × Data = Updated Knowledge
A better understanding
of the world
Prior × Data = Posterior
Bayesian Statistics
• Bayesian-ism can be a lot like a religion
• Different “sects” of (dogmatic) Bayesians don’t believe other
“sects” are “true Bayesians”
Parametric
BUGS (Bayesian Using Gibbs Sampling)
MCMC (Markov-Chain Monte Carlo)
Andrew Gelman (Columbia)
David Speigelhalter (Cambridge)
The major Bayesian “churches”
Bayes Nets
Graphical Models
Steffan Lauritzen (Oxford)
Judea Pearl (UCLA)
Empirical Bayes
Data-driven
Brad Efron (Stanford)
Bayesian Statistics
• What’s a Bayesian…??
• Someone who adheres ONLY to belief
interpretation of probability?
• Someone who uses Bayesian methods?
• Only uses Bayesian methods?
• Usually likes to beat-up on frequentist methodology…
Bayesian Statistics
• Actually DOING Bayesian statistics is hard!
• We will up-date this
prior belief later
Why?
Parametric Bayesian Methods
• All probability functions are “parameterized”
Bayesian Statistics
• We have a prior “belief” for the value of the mean
• We observe some data
• What can we say about the mean now?
We need Bayes’ rule:
YUK!
And this is for an “easy” problem
Bayesian Statistics
• So what can we do????
• Until ~ 1990, get lucky…..
• Sometimes we can work out the integrals by hand
• Sometimes you get posteriors that are the same
form as the priors (Conjugacy)
• Now there is software to evaluate the integrals.
• Some free stuff:
• MCMC: WinBUGS, OpenBUGS, JAGS
• HMC: Stan
Bayesian Networks
• A “scenario” is represented by a joint probability
function
• Contains variables relevant to a situation which represent
uncertain information
• Contain “dependencies” between variables that describe how
they influence each other.
• A graphical way to represent the joint probability
function is with nodes and directed lines
• Called a Bayesian NetworkPearl
Bayesian Networks
• (A Very!!) Simple exampleWiki:
• What is the probability the Grass is Wet?
• Influenced by the possibility of Rain
• Influenced by the possibility of Sprinkler action
• Sprinkler action influenced by possibility of Rain
• Construct joint probability function to answer
questions about this scenario:
• Pr(Grass Wet, Rain, Sprinkler)
Bayesian Networks
Pr(Sprinkler | Rain)
Sprinkler
:
Rain:
yes
no
was on
was off
40%
60%
1%
99%
Pr(Rain)
Rain: yes
no
Pr(Grass Wet | Rain, Sprinkler)
Grass
Wet:
Sprinkler:
Rain:
was on
yes
was on
no
was off
yes
was off
no
yes
no
99%
1%
90%
10%
80%
80%
0%
100%
20%
80%
Bayesian Networks
Pr(Sprinkler)
Pr(Rain)
Other probabilities
are adjusted given
the observation
Pr(Grass Wet)
You observe
grass is wet.
Bayesian Networks
• Areas where Bayesian Networks are used
• Medical recommendation/diagnosis
• IBM/Watson, Massachusetts General Hospital/DXplain
• Image processing
• Business decision support
• Boeing, Intel, United Technologies, Oracle, Philips
• Information search algorithms and on-line recommendation
engines
• Space vehicle diagnostics
• NASA
• Search and rescue planning
• US Military
• Requires software. Some free stuff:
• GeNIe (University of Pittsburgh)G,
• SamIam (UCLA)S
• Hugin (Free only for a few nodes)H
• gR R-packagesgR
Bayesian Statistics
Bayesian network for the provenance of a painting given trace evidence found on that painting
Bayesian Statistics
• Frequentist hypothesis testing:
• Assume/derive a “null” probability model for a
statistic
• E.g.: Sample averages follow a Gaussian curve
Say sample statistic falls here
“Wow”! That’s an unlikely
value under the null hypothesis
(small p-value)
Bayesian Statistics
• Bayesian hypothesis testing:
• Assume/derive a “null” probability model for a
statistic
• Assume an “alternative” probability model
p(x|null)
p(x|alt)
Say sample statistic falls here
The “Bayesian Framework”
• Bayes’ RuleAitken, Taroni:
•
•
•
•
Hp = the prosecution’s hypothesis
Hd = the defences’ hypothesis
E = any evidence
I = any background information
Pr(E | H p , I )
Pr(H p | E, I ) =
Pr(H p , I )
Pr(E)
Pr(E | H d , I )
Pr(H d | E, I ) =
Pr(H d , I )
Pr(E)
The “Bayesian Framework”
• Odd’s form of Bayes’ Rule:
Likelihood Ratio
{
Posterior odds in favour of
prosecution’s hypothesis
{
{
Pr(H p | E, I ) Pr(E | H p , I ) Pr(H p , I )
=
´
Pr(H d | E, I ) Pr(E | H d , I ) Pr(H d , I )
Prior odds in favour of
prosecution’s hypothesis
Posterior Odds = Likelihood Ratio × Prior Odds
The “Bayesian Framework”
• The likelihood ratio has largely come to be the
main quantity of interest in their literature:
Pr(E | H p , I )
LR =
Pr(E | H d , I )
• A measure of how much “weight” or “support”
the “evidence” gives to one hypothesis relative to
the other
• Here, Hp relative to Hd
• Major Players: Evett, Aitken, Taroni, Champod
• Influenced by Dennis Lindley
The “Bayesian Framework”
Pr(E | H p , I )
LR =
Pr(E | H d , I )
• Likelihood ratio ranges from 0 to infinity
• Points of interest on the LR scale:
• LR = 0 means evidence TOTALLY DOES
NOT SUPPORT Hp in favour of Hd
• LR = 1 means evidence does not support either
hypothesis more strongly
• LR = ∞ means evidence TOTALLY SUPPORTS
Hp in favour of Hd
The “Bayesian Framework”
Pr(E | H p , I )
LR =
Pr(E | H d , I )
• A standard verbal scale of LR “weight of
evidence” IS IN NO WAY, SHAPE OR
FORM, SETTLED IN THE STATISTICS
LITERATURE!
• A popular verbal scale is due to Jefferys but
there are others
• READ British R v. T footwear case!
Bayesian Networks
• Likelihood Ratio can be obtained from the BN once evidence is
entered
• Use the odd’s form of Bayes’ Theorem:
Probabilities of the theories after
we entered the evidence
Probabilities of the theories before
we entered the evidence
The “Bayesian Framework”
• Computing the LR from our painting provenance
example:
How good of a “match” is it?
Efron Empirical Bayes’
• An I.D. is output for each questioned
toolmark
• This is a computer “match”
• What’s the probability the tool is truly the
source of the toolmark?
• Similar problem in genomics for detecting
disease from microarray data
• They use data and Bayes’ theorem to get an
estimate
No diseasegenomics = Not a true “match”toolmarks
Empirical Bayes’
• We use Efron’s machinery for “empirical
Bayes’ two-groups model”Efron
• Surprisingly simple!
• Use binned data to do a Poisson regression
• Some notation:
• S-, truly no association, Null hypothesis
• S+, truly an association, Non-null hypothesis
• z, a score derived from a machine learning task
to I.D. an unknown pattern with a group
• z is a Gaussian random variate for the Null
Empirical Bayes’
• From Bayes’ Theorem we can getEfron:
Estimated probability of not a true
“match” given the algorithms'
output z-score associated with its
“match”
(
) ( )
ˆ
p
z
|
S
ˆ S- | z =
ˆ SPr
Pr
fˆ ( z )
(
)
Names: Posterior error probability (PEP)Kall
Local false discovery rate (lfdr)Efron
• Suggested interpretation for casework:
• We agree with Gelaman and ShaliziGelman:
“…posterior model probabilities …[are]… useful as tools for prediction and for
understanding structure in data, as long as these probabilities are not taken too seriously.”
(
)
ˆ S- | z = Estimated “believability” of machine made association
1- Pr
Empirical Bayes’
• Bootstrap procedure to get estimate of the KNM distribution of
“Platt-scores”Platt,e1071
• Use a “Training” set
• Use this to get p-values/z-values on a “Validation” set
• Inspired by Storey and Tibshirani’s Null estimation methodStorey
• Use SVM to get KM and KNM “Platt-score” distributions
• Use a “Validation” set
From fit histogram by Efron’s method get:
(
fˆ ( z ) “mixture” density
)
pˆ z | S- z-density given KNM => Should be Gaussian
( )
ˆ S- Estimate of prior for KNM
Pr
What’s the point??
We can test the fits to
z-score
(
)
fˆ ( z ) and pˆ z | S !
Posterior Association Probability: Believability Curve
12D PCA-SVM locfdr fit for
Glock primer shear patterns
+/- 2 standard errors
Bayes Factors/Likelihood Ratios
• In the “Forensic Bayesian Framework”, the Likelihood
Ratio is the measure of the weight of evidence.
• LRs are called Bayes Factors by most statistician
• LRs give the measure of support the “evidence” lends to
the “prosecution hypothesis” vs. the “defense hypothesis”
• From Bayes Theorem:
LR =
(
Pr E | H p
Pr ( E | H d
(
Pr H p | E
)
) = Pr ( H | E ) = Posterior Odds
Prior Odds
) Pr ( H )
d
p
Pr ( H d )
Bayes Factors/Likelihood Ratios
• Once the “fits” for the Empirical Bayes method are
obtained, it is easy to compute the corresponding
likelihood ratios.
o Using the identity:
(
)
ˆ S- | z
Pr(H p | E) = 1- Pr
the likelihood ratio can be computed as:
LR(z) =
(
ˆ S- | z
1- Pr
ˆ S- | z
Pr
(
)
ˆ (S )
1 - Pr
ˆ (S )
Pr
-
-
)
Bayes Factors/Likelihood Ratios
• Using the fit posteriors and priors we can obtain the likelihood ratiosTippett, Ramos
Known match LR values
Known non-match LR values
Acknowledgements
• Professor Chris Saunders (SDSU)
• Professor Christophe Champod (Lausanne)
• Alan Zheng (NIST)
• Research Team:
• Dr. Martin Baiker
• Ms. Helen Chan
• Ms. Julie Cohen
• Mr. Peter Diaczuk
• Dr. Peter De Forest
• Mr. Antonio Del Valle
• Ms. Carol Gambino
• Dr. James Hamby
•
•
•
•
•
•
•
•
•
•
•
Ms. Alison Hartwell, Esq.
Dr. Thomas Kubic, Esq.
Ms. Loretta Kuo
Ms. Frani Kammerman
Dr. Brooke Kammrath
Mr. Chris Luckie
Off. Patrick McLaughlin
Dr. Linton Mohammed
Mr. Nicholas Petraco
Dr. Dale Purcel
Ms. Stephanie Pollut
•
•
•
•
•
•
•
•
•
•
Dr. Peter Pizzola
Dr. Graham Rankin
Dr. Jacqueline Speir
Dr. Peter Shenkin
Ms. Rebecca Smith
Mr. Chris Singh
Mr. Peter Tytell
Ms. Elizabeth Willie
Ms. Melodie Yu
Dr. Peter Zoon