Bayesian Signal Processing

Download Report

Transcript Bayesian Signal Processing

Bayesian Signal Processing
Ercan Engin Kuruoglu
[email protected]
Why do we study Bayesian Theory?
It allows us to formulate our prior knowledge or our
belief about the data.
Classical techniques ignore any prior information we
might have about the problem.
Why do we study Bayesian Theory?
Many real world applications require processing vast
amounts of data.
We have to be selective and look for solutions in the right
We have to be able to learn from the changing nature of
Why do we study Bayesian Theory?
It is a very good mimic of how our brain learns and
thinks itself.
Why do we study Bayesian Theory?
Why now and not before?
Our computational power can now handle the
computational complexity of numerical Bayesian
Radar, sonar
Image processing
Machine vision
Biomedical signal processing
Our approach
The Bayesian theory does not involve difficult mathematics or
hard to decide strategies.
It is a unified theory and approach
Our discussion will be more heuristic than rigorous
We will first try to adopt the Bayesian way of thinking
We will obtain some basis of the theory by analytical
Numerical techniques will be emphasized since it is the hot
area in research.
We will discuss various different applications and see how
your own research could adopt a Bayesian formulation.
Our aims
To learn the basics of Bayesian theory
To understand its potentials
To see a picture of current research on
applications of Bayesian theory
To see whether we can use it in our research
History – Thomas Bayes
Bayes’ Tomb
Rev. Thomas Bayes (1701-1761)
Bayes was born to an
English family from
Sheffield (Yorkshire)
Bayes was a Presbyterian
A non-conformist one
Therefore could not study at
Cambridge or Oxford.
Studied at Edinburgh.
In Edinburgh took
mathematics lessons from
James Gregory
Religious writings
Divine Benevolence
Royal society member
Critic of many mathematical work
Close contact with John Canton and Richard Price
Natural sciences
Strong defendant of Isaac Newton
Doctrine of Fluxions
Interest in infinite series
advanced the work of Maclaurin
The Bayes’ Theorempubblication
The famous paper on Bayes’ Theorem was published after he
died by Price in 1763 who communicated it to the Royal
“An Essay Towards Solving a Problem in the Doctrine of
Chances”, Philosophical Transactions of the Royal Society
of London, vol 53, pp. 370-418.
Reference: D.R. Bellhouse, “The reverend Thomas Bayes: a
biography to celebrate the tercentenary of his birth,”
Statistical Science, 2004, Vol. 19, No. 1, pp. 3-43.
Bayes theory attracted much attention after its publication but
received much criticism because of its philosophical nature.
Initial reaction was due to non-orthodox implications in
The theory was given a formal framework by Laplace (1774).
During 19th century it was almost completely abandoned
although starting the work of Jeffreys in 1930’s it became
placed again in the center of controversy.
Cox’s work in 1946 helped clarify several issues.
These debates still continue. The theory is still resisted by the
school of frequentists. But since 1980s the Bayesian stand is
getting ever more accepted.
Frequentist vs Bayesian
view of probability
Frequentist vs Bayesian Probability
There are two main interpretations of
Frequentist or objective or empirical (classical)
Bayesian or subjective or evidential
Frequentist Probability
Frequentist views probability as long-run
relative frequencies.
Imagines that events are outcomes of
experiments that can be run indefinitely under
identical conditions.
The frequentist probability of an event occuring
is the limit, as the number of experiments goes
to infinity, of the proportion of times that the
event occurs.
Frequentists are outside observers!
Frequentist example
Experiment: Tossing a fair coin
Possible events : {H, T}
A frequentist would assign probability to the event
“tails” by imagining the experiment being run infinitely
many times and measuring the proportion of the times a
T comes up.
This concept of probability was first proposed by Venn
in 1886 and it leads to the classical, frequentist approach
to statistical inference.
Subjective probability
Another way to view probability is as a personal, but
rational, measure of certainty/uncertainty based on available
This view of probability is called subjective.
However, when available evidence is empirical, the
frequentist and subjective probabilities coincide.
The subjectivist does not need to imagine throwing coins
infinitely, the evidence tells us that the coin has two sides
and is fair and therefore the probability of throwing T on a
single toss is one half.
Subjectivists are part of the experiment, since they act with
their belief based on evidence.
Example 2
The probability of passing this course this
The subjectivist can look at how you performed in
previous statistics courses, your skills and study habits
and guess that it is 0.9.
The frequentist however cannot make you do the exams
infinite times. They cannot rationally and systematically
conceptualiase this process to provide meaningful
probability statement regarding this event.
Thus, the subjective probabilities are more general than
the frequentist one as they can be used to assign
uncertainty to single, unique events.
Parameter uncertainty
Frequentist (classical) statistics assumes that population
parameters are unknown but deterministic constants.
Estimation proceeds by randomly sampling from the population and
using this data to estimate the parameters of interest.
So, the truth is there, you just collect data to fit the model on it and
find the fixed parameters.
The subjectivist statistics views the parameters unknown and
The estimate the parameter, we sample data, but since it is only
sampled data, we will never be able to know the exact value of our
quantity, but we will know more after the sample has been studied
than we did before.
Subjectivist view of learning
A subjectivist is uncertain about parameter values before
sample data is collected. The new evidence in the sample will
reduce the uncertainty. Subjectivist measures uncertainty
using probability.
The method of statistical inference that starts with initial
uncertainty about parameter values and then modifies this
uncertainty using sample information is known as Bayesian
statistical inference, or Bayesian learning.
Bayesian learning naturally reflects our way of learning since
childhood and also the way scientific knowledge is advanced.
Introduction to Bayesian Inference
What do hypotheses predict about potential data?
How does data support (or undermine) hypotheses?
Introduction to Bayesian Inference
Deduction: Deduce outcomes from hypotheses
A —>B
Therefore B
Induction: Infer hypotheses from outcomes
If A then we are likely to observe B and C
B and C are observed
Therefore A is supported
Introduction to Bayesian Inference
How new data support hypotheses
What do we infer if we observe D1? D2? D3?
Introduction to Bayesian Inference
How new data support hypotheses
Observing D1 refutes H1, supports H2 a little and
H3 strongly
Observing D2 supports H1 and H3 a little and H2
Introduction to Bayesian Inference
Statistical inference is not magic
Cannot get information that isn’t present in the
Statistical inference is easily misused
Watch out for “Garbage in, Garbage out”
Often the right solution is a better experiment
or better observations, not slick statistical
For a Bayesian, probability is conditioned on what
each individual knows. It can vary from individual
to individual.
Probability is not “out there”. It is in your head.
“Probability does not exist”—Phil Dawid
Probability is about epistemology, not ontology.
Probability as belief
In the Bayesian view, probability describes your
degree of belief, given what you know.
Probability as belief
R.T. Cox (and independently, I.J. Good) proposed the
following reasonable assumptions about plausibilities or
degrees of belief
Plausibility should be transitive, i.e., if A is more plausible than B
and B more plausible than C then A is more plausible than C. This
means that it should be possible to attach a real number P(A) to
each proposition, and rank plausibilities by those numbers
The plausibility of ~A (not-A, negation of A, A ) should be some
function of the plausibility of A: P(~A)=f(P(A))
The plausibility of (A and B) should be some function of the
plausibility of A given that B is true, and the plausibility of B:
The Conditional Nature of
Probability is always conditional, that is, the probability
that You assign to something is always dependent on
information that You have. Someone else, with different
information, will usually, and legitimately, assign a
different probability to a proposition than You will
Background information is information that is assumed but
not always stated. Such information could include things
like: How do do mathematics, basic physical and
astronomical knowledge that You may have, stuff You
learned while growing up, etc., etc. There is always
background information, which we write H and often
include as an explicit reminder of this fact.
Failure to include H can lead to apparent paradoxes in
probability theory
Probability Axioms
We write P(A|H) to represent “the probability that A is true,
given that H is true”. It is a real number. P(A|H) satisfies
the following axioms of probability:
0 ≤ P(A|H) ≤ 1
P(A|A,H) = 1
P(A|H) + P(~A|H) = 1 where ~A is the negation of A
P(A&B|H) = P(A|B&H) P(B|H) product law, and definition of
conditional probability
• P(A|B&H) = P(A&B|H)/P(B|H) if P(B|H)≠0
If A and B are mutually exclusive propositions, then
P(A or B|H) = P(A|H)+P(B|H)
[Derivable from the product law, hence not an independent axiom]
The Bayes’ Theorem
Suppose we have two events, A and B
P(A  B)  PA | BPB
P(A  B)  PB | APA
Equating both sides
Bayes’ Theorem
Bayes’ Theorem is a trivial result of the definition of
conditional probability: when P(D)≠0,
P ( & D )
P ( | D ) 
P( D)
P ( D |  ) P ( )
P( D)
 P ( D |  ) P ( )
Note that the denominator P(D) is nothing but a
normalization constant required to make the total
probability on the left sum to 1
Often we can dispense with the denominator, leaving its
calculation until last, or even leave it out altogether!
Bayes’ Theorem
Bayes’ theorem is a model for learning. Thus, suppose we
have an initial or prior belief about the truth of A. Suppose
we observe some data D. Then we can calculate our
revised or posterior belief about the truth of θ, in the light
of the new data D, using Bayes’ theorem
P ( & D )
P ( | D) 
P( D)
P ( D |  ) P ( )
P( D)
 P ( D |  ) P ( )
Bayes’ Theorem
P( D |  ) P( )
P( | D) 
P( D)
PD|  : likelihood
P  :
P D  :
priordistribution (quantification of our belief)
(marginal of the data distribution)
P | D  : posteriordistribution
P( D)   P( D & i )   P( D |  i ) P( i )
The Bayesian mantra:
posterior  prior  likelihood
Bayes’ Theorem
In the special case that there are only two states of nature,
A1 and A2=~A1, we can bypass the calculation of the
marginal likelihood by using the odds ratio, the ratio of the
probabilities of the two hypotheses:
P( A1 )
P riorodds 
P( A2 )
P( D | A1 ) P( A1 )
P osteriorodds 
P( D | A2 ) P( A2 )
 Likelihoodratio P riorodds
The marginal probability of the data, P(D), is the same in
each case and cancels out
The likelihood ratio is also known as the Bayes factor.
Bayes’ Theorem
In this case, since A1 and A2 are mutually
exclusive and exhaustive, we can calculate
P(A1|D) as well as P(A1) from the posterior and
prior odds ratios, respectively, and vice versa
Odds 
1  Probability
Probability 
1  Odds
Bayes’ Theorem
The entire program of Bayesian inference can be
encapsulated as follows:
Enumerate all of the possible states of nature and
choose a prior distribution on them that reflects your
honest belief about the probability that each state of
nature happens to be the case, given what you know
Establish the likelihood function, which tells you how
well the data we actually observed are predicted by
each hypothetical state of nature
Compute the posterior distribution by Bayes’ theorem
Summarize the results in the form of marginal
distributions, (posterior) means of interesting quantities,
Bayesian credible intervals, or other useful statistics
Bayes’ Theorem
That’s it! In Bayesian inference there is one uniform way
of approaching every possible problem in inference
There’s not a collection of arbitrary, disparate “tests” or
“methods”—everything is handled in the same way
So, once you have internalized the basic idea, you can
address problems of great complexity by using the same
uniform approach
Of course, this means that there are no black boxes. One
has to think about the problem you have…establish the
model, think carefully about priors, decide what summaries
of the results are appropriate. It also requires clear thinking
about what answers you really want so you know what
questions to ask.