Introduction to Statistics

Download Report

Transcript Introduction to Statistics

Probability and Bayes Theorem
Pregnancy Test Kit
The test kit may either be accurate, or inaccurate.
Actually pregnant
Actually not pregnant
Test kit shows +ve
Correct +ve diagnosis
(Sensitivity, or Power)
Incorrect +ve
diagnosis
Test kit shows –ve
Incorrect –ve
diagnosis
Correct –ve diagnosis
(Specificity)
Sensitivity or Specificity?
• Objective of the experiment
• HIV diagnostic kit, 99.9% sensitive and 99.5% specific
Correct identification
of HIV +ves
Correct identification
of HIV -ves
What do these numbers mean?
What’s the interpretation of
probability and chance?
Data exploration and Statistical analysis
1. Data checking, identifying problems and characteristics
2. Understanding chance and uncertainty
Probability
• A mathematical attempt at explaining random
phenomenon
• Examples:
• Flipping a coin
• the bus arriving within the next 5 minutes
Types of probability
Experimental (empirical) probabilities
Arbitrarily good estimates of the probability of a certain
outcome of an experiment can be obtained by repeating
the experiment sufficiently often. (e.g. flipping a coin, rolling
a die)
Subjective probabilities
Some phenomena are observed just once and repeated
experiments are impossible. When there is no optimal
model, probabilities are often based on subjective
judgement. (e.g. flash floods in Orchard Road, Wall Street
crashes)
Understanding probability
• Need to know what are the possible outcomes (or sample
space (S), in geek-speak)
• Is the outcome predictable, or random?
Basic definitions in probability
• Union of events: A  B means that A or B (or both).
• Intersection of events: A  B means that both A and B
occur at the same time.
• Complementary event: The complement of an event A,
denoted A’ or , occurs if A does not occur.
• Mutually exclusive events: Two events are mutually
exclusive if they cannot occur at the same time.
Suppose that you plan to roll a die just once. Let A be the event that
you get an odd number, and B the event that you get a six. Then A and
B are mutually exclusive.
Basic rules in probability
• P(A)  0, for any event A, i.e. probabilities are always
positive.
• P(S) = 1, which means that some element of the
probability space will occur for sure as outcome of our
random experiment. Note that the empty set is also found
in the probability space.
• P(A  B) = P(A) + P(B), if A and B are mutually exclusive.
It is actually assumed that this rule generalizes to an
arbitrary number of mutually exclusive events.
Example 1
Example 1:
Flip a fair coin twice and count the number of heads. Let S
= {0, 1, 2} denote the sample space. Are all elements of S
equally likely?
Combinatorics
• Often sample spaces can be quite large. In such cases,
combinatorics is helpful to figure out the precise size.
• Multiplication rule
Suppose that an experiment (or a procedure) is carried out
k times and that in each step there are n possible
outcomes. All combinations of individual outcomes are
possible. Then there is a total of
n  n  … n = nk
k times
possible outcomes sequences.
Example 2
Example 2:
Suppose a die is rolled six times and consider the sample
space S of all possible outcome sequences. One element
of S would be for instance (4, 2, 2, 6, 1, 3). What is the
number of elements of S?
Generalized multiplication rule
Suppose that an experiment (or a procedure) consists of k
steps and that there are nj possible outcomes for step j. All
combinations of individual outcomes are possible. Then
there is a total of
n1  n2 … nk
possible outcome sequences.
Example 3
Example 3:
Suppose that a license number consists of four digits
followed by three letters (uppercase). How many different
license numbers are there?
Example 4
Example 4:
DNA strands consist of nucleotide sequences. There are
four possible nucleotides, labeled A, C, T, G. Find the total
number of possible nucleotide sequences of length 50.
What is the chance that a randomly composed sequence of
length 50 will start with the letters “TATA”?
Permutation
How many different arrangements of the letters a, b and c
are possible? It is easy to check that there are six, namely
abc, acb, bac, bca, cab, cba
Each arrangement is called a permutation. When arranging
a larger number of items, direct enumeration quickly
becomes unwieldy.
For n different objects, there are
n! = n  (n – 1) … 2  1
possible arrangements.
Example 5
Example 5:
An important branch of statistics is experimental design. In
agriculture and plant biology, methods of experimental
design are used to find good ways to allocate different
varieties of plant (that should be compared) to different
experimental fields. Suppose you have six varieties of plant
and six experimental fields. How many possible allocations
of plants to fields are there?
K out of N permutations
Suppose we have to assign k different items to n different
objects, one item per object (k  n). The number of possible
assignments is
Example 6
Example 6:
In football world championships, there are usually 32
participants. How many possible outcomes are there, if
only the first three places are of interest?
K distinct items out of N permutations
Suppose that you are allocating n = n1 + n2 +…+ nk items.
Among these n1 are of type 1, n2 are of type 2,…, and nk of
type k. Items of the same type are not distinguishable.
Then the number of distinct possible allocations is
Example 7
Example 7:
How many nucleotide sequences are there that consist of
four A’s, three C’s, three T’s and no G? If a nucleotide
sequence of length 10 is composed at random such that
each letter has the same probability to occur at any
position, what is the chance of getting one of the above
mentioned nucleotide sequences?
Choosing K items out of N irrespective of
order
If the order of the selection is irrelevant, there are
possibilities to choose k objects from n, or “n choose k”,
denoted as
Example 8
Example 8:
Suppose that there are 20 patients participating in a
clinical study. We want to assign five of them to a control
group that receives only a placebo. How many possible
assignments are there?
Example 9
Example 9:
Suppose that there is a group of 23 people in a room.
What is the chance that at least two of them have the same
birthday? (assuming none of them were born in leap years)
Example 10
Example 10:
In ecology, animals are often caught, marked and released. When the
same animal is recaptured, this provides valuable information
concerning the size of a population (capture/recapture models).
Development of animals and habits such as migration behavior can
also be studied this way.
Suppose that we have a population of 1000 birds in an area. A team of
ecologists plans to capture 50 of them (one after the other), mark them
and release them subsequently. Suppose that for all the birds, the
probability of capture is the same (this will often be an oversimplification of reality). What is the chance that none of the birds is
recaptured?
(Consider the use of Stirling’s formula:
)
Addition rule in probability
P(E  F) = P(E) + P(F) – P(E  F)
Example 11:
We toss two fair coins. Let A denote the event that the first
coin lands heads, and B the event that the second coin
lands heads. Find P(A  B).
Example 12
Example 12:
A total of 36 members of a club play tennis, 28 play
squash, and 18 play badminton. Furthermore, 22 of the
members play both tennis and squash, 12 play both tennis
and badminton, 9 play squash and badminton and 4 play
all three sports. How many members of the club play at
least one of these sports?
Mutually exclusive and independence
If A and B are mutually exclusive events, i.e. A  B = ,
then P(A  B) = 0.
Two events are independent if the occurrence or nonoccurrence of one event has no influence on the
occurrence or non-occurrence of the other event.
Events A and B are independent  P(A  B) = P(A)  P(B)
Dependent events and conditional
probability
Consider this scenario:
If 2 cards are drawn from a deck, what is the probability that both will
be diamonds?
Simple?
Alternatively, we could treat the situation as two outcomes: the outcome
for the first draw of a card and the outcome for the draw of the second
card. Let us denote event A as drawing a diamond on the first draw and
event B as drawing a diamond on the second draw.
In this case, the probability for the second event is dependent on the
outcome of the first event. The probability that the second draw is a
diamond given that the first card drawn is a diamond is denoted by P(B
| A) and is called the conditional probability of the event B given that the
event A has occurred.
Probability tree diagram
Definitions concerning conditional
probability
P(X | Y) denotes the probability that the event X occurs given that the
event Y has occurred. If P(Y) > 0, then the conditional probability of the
event X given the event Y is defined by
This generalizes to yield the very useful Bayes’ Theorem.
Partitioning
A useful concept in biomedical sciences is the concept of partitioning.
Given any events A and B,
P(A) = P(A | B) P(B) + P(A | B’)P(B’)
where B’ is the complement of B (or defined as “not B”).
This generalizes to a useful result in probability, known as the
Law of Total Probability
Example 13:
A hospital sends probes to one of three laboratories. Twenty
percent of the probes are sent to lab A, 30% to lab B and 50%
to lab C. Lab A is expensive, but produces incorrect results with
a probability of only 0.0002. Lab B has an error probability of
0.0005 and lab C one of 0.0008. Suppose you go to this
hospital and a probe is taken. What is your chance that you get
a correct result, if you do not know to which of the labs your
probe will be sent. (In practice, the error probabilities will also
depend on the type and difficulty of the analysis required).
Partitioning with Bayes’ Theorem
Suppose that both P(A), P(B) > 0. Then
This is an extremely useful representation!
Example 14:
An ectopic pregnancy is twice as likely to develop when the
pregnant woman is a smoker as it is when she is a non-smoker.
If 32% of women of childbearing age are smokers, what
percentage of women having ectopic pregnancies are smokers?
Example 15:
There is a 50-50 chance that the queen carries the gene for
hemophilia. If she is a carrier, then each prince has a 50-50
chance of having hemophilia. If the queen has had three
princes without the disease, what is the probability that the
queen is a carrier?
Arrggghhhhh!!!
(I thought I chose life sciences so I don’t need to do
mathematics anymore!!!)
So why do we need to
learn all these theoretical
probability in life
sciences?!
GIBBERISH?!
Pregnancy Test Kit
The test kit may either be accurate, or inaccurate.
How do we interpret the findings from such a pregnancy
test kit?
Images from www.google.com
Example 16:
Suppose you (or your girlfriend) bought a pregnancy test-kit and
observed a positive result upon testing. Recalling today’s lesson
in probability, you immediately turned to be the back of the box,
and it says:
“… 99% chance of a true positive result, and a 99.9% chance of
a true negative result.”
What is the probability that you (or your girlfriend) are actually
NOT pregnant, given that your prior belief is that you are
equally likely to be either?
Thinking that the test-kit may be faulty, you decided to buy
another kit which showed another positive result. So what is the
probability that you (or your girlfriend) are actually NOT
pregnant, given the observation of two positive results?
Students should be able to
• understand the fundamental rules of probability
• perform simple counting through combinatorics and
permutation
• understand the meanings of mutually exclusivity and
independence
• calculate conditional probabilities
• understand partitioning and the use of the Law of Total
Probability
• understand and use Bayes’ Theorem for tackling practical
problems in probability