Chapter 1: Data Collection

Transcript Chapter 1: Data Collection

Chapter 5: Probability
5.1 Probability Rules
5.2 The Addition Rule and Complements
5.3 Independence and the Multiplication Rule
5.4 Conditional Probability and the General Multiplication Rule
5.5 Counting Techniques
November 4, 2008
1
Example
Question: When you submit your 2007 IRS tax return in 2008, what
are your chances of having it audited by the IRS if your income is less
than $25,000? That is, what is the probability of being audited? That
is, what is the chance that you will be audited by the government.
Information: In 1997, 1.5% were audited.
2007 Tax Auditing
Income
Filed
Examined
< $25K
59,211,700
1,076,945
$25K-$50K
27,263,000
259,794
$50K-$100K
17,019,200
196,582
> $100K
4,540,800
129,320
$0  25, 000 :
1076945
 0.018  1.8%
59211700
2
How can Probability Quantify
Randomness?
Question: What does the word probability mean?
Possible Answer: Probability is a branch of mathematics that deals with
calculating the likelihood of a given event's occurrence, which is expressed as
a number between 1 and 0. An event with a probability of 1 can be considered
a certainty: for example, the probability of a coin toss resulting in either
"heads" or "tails" is 1, because there are no other options, assuming the coin
lands flat. An event with a probability of .5 can be considered to have equal
odds of occurring or not occurring: for example, the probability of a coin toss
resulting in "heads" is .5, because the toss is equally as likely to result in
"tails." An event with a probability of 0 can be considered an impossibility: for
example, the probability that the coin will land (flat) without either side facing
up is 0, because either "heads" or "tails" must be facing up. A little
paradoxical, probability theory applies precise calculations to quantify
uncertain measures of random events.
http://whatis.techtarget.com/definition/0,,sid9_gci549076,00.html
3
Another Definition
Probability is the branch of mathematics that studies the possible
outcomes of given events together with the outcomes' relative
likelihoods and distributions. In common usage, the word
"probability" is used to mean the chance that a particular event (or
set of events) will occur expressed on a linear scale from 0
(impossibility) to 1 (certainty), also expressed as a percentage
between 0 and 100%. The analysis of events governed by
probability is called statistics.
http://mathworld.wolfram.com/Probability.html
4
Randomness
• Randomness is often observed in the outcomes of a response
variable in either an observational or experimental study.
• All the possible outcomes are known, but it is uncertain which
outcome will occur for any given observation.
• Randomness is the opposite of deterministic where a given
input doesn’t always produces the same result.
5
Creating Random Events
• A machine or procedure that produces random events is
called a randomizer.
• Examples of Randomizers:
– Rolling dice
– Wheel of Fortune
– Flipping a coin
– Drawing a card from a shuffled deck
6
Applets
7
Terminology
The process of rolling the die several times with varying results is
called a probability experiment. Each roll of the die or dice is called
a trial or outcome or event. The number of times that a certain
event (outcome) occurs divided by the total number so trials is
called the cumulative proportion or relative frequency of the
probability experiment. Suppose that in rolling one die 200 times,
the number 2 occurs 45 times. The cumulative proportion of this
event is 45/200 = 0.225.
8
Events and Sample Space
Definition: A simple event is an outcome from a probability
experiment that is observed on a single repetition of the experiment.
The sample space of a probability experiment is the set of all possible
simple events from the probability experiment. An event is a collection
of simple events; in other words, it is a subset of the sample space.
An event that consists of more than one outcome is called a
compound event.
9
Examples
Probability Experiment
Sample Space
Coin Toss
{H,T}
Roll a single Die
{1,2,3,4,5,6}
True/False Quiz Question
{T,F}
Roll two Dices
{(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(2,1),(2,2),…,
(6,1),(6,2),(6,3),(6,4),(6,5),(6,6)}
10
Simple and Compound Events
Probability Experiment
Sample Space
Simple Event
Compound Event
Roll two Dice
{(1,1),(1,2),…,(6,6)}
Die1 = 1
Die2= 4
{(1,4)}
Sum of Dice is 5
{(1,4),(4,1),(2,3),(3,2)}
Ace of Hearts
{(A,H)}
Queen
{(Q,H),(Q,D),(Q,C),(Q,S)
}
Choose a single Card*
{(A,H),(2,H),(3,H),…,
(A,S)(2,S),…,(K,S)}
* H = heart, D = diamond, C = club, S = spade
11
Examples
Probability Experiment
Birth of a single child:
Male(m) or female(f)
Three births
Event
Sample Space
male
{m} is a simple event.
S = {m,f}
2 males and 1 female is not a
simple event.
{mmf,mfm,fmm} are each simple
events.
S = {mmm,fff,ffm,mmf,mff,mfm,fmf,fmm}
Note: The event, 2 males and 1 female, is not simple, because it can happen as mmf, mfm or fmm.
We would call this a compound event.
12
Probability
Definition : Suppose that a probability experiment has n equally likely outcomes. Let E be
an event for this probability experiment and suppose that this event can occur m times.
m
Then the probability of the event E, denoted P(E), is defined as P(E)  . In other words,
n
if N(E) denotes the number of outcomes in E and N(S) is the number of outcomes in the sample
N(E)
space, then P(E) 
.
N(S)
Note : Since 0  m  n, we have 0  P(E)  1.
Remark : The probability of an event E can be approximated by the relative frequency of E
occuring in a probability experiment. That is, we perform k trials of the experiment and count
frequency of E
how many times that E occurs. Then relative frequency of E  fE 
 P(E).
number of trials
13
Remark
If the relative frequencies of events are known in a population, then the
probability of these events are exactly the relative frequencies. For
example, if we know that a bag contains 2 red balls, 3 green balls, 4 white
balls and 1 blue ball, then the probability of randomly selecting a red ball
from the bag is the relative frequency of red balls i.e., 2/10 = 1/5 or 0.2. If
the relative frequency of an event in a population is unknown, then it can
be approximated by simulating the event.
14
Rules of Probabilities
Let E denote an event and P(E) the probability that the event will occur. Then
1. 0  P(E)  1
2. Let E1 , E2 ,..., En  denote all of the possible events in a probabiltiy experiment i.e.,
the sample space of the eperiment is E1 , E2 ,..., En . Then
n
 P(E )  P(E )  P(E )  ...  P(E )  1.
i
1
2
n
i 1
15
Probability Model
Definition: A probability model is a table that lists all possible
outcomes of a probability experiment and their probabilities. The
form of this table is shown below.
Outcome
Probability
16
Example (probability model)
The following table contains the probabilities of choosing different color
balls out of a bag. Could this be a probability model for the experiment?
Ball Color
Probability
red
0.25
yellow
0.35
white
0.20
blue
0.25
E  {red, yellow,white,blue}  E1 , E2 , E3 , E4 
4
 P E   P(E )  P(E )  P(E )  P(E )  0.25  0.35  0.20  0.25  1.05  1.0
i
1
2
3
4
i 1
17
Example (probability model)
Suppose that we have a bag of jelly beans in six colors (green, orange, blue, red,
yellow and brown). We have determined by performing a number of trials that the
probability of picking a particular color is:
Color
Probability
green
0.16
orange
0.20
blue
0.24
P red  0.13
red
0.13
P green  P orange  P blue  P red  P yellow  P brown
yellow
0.14
 0.16  0.12  0.24  0.13 0.14  0.13  1.0
brown
0.13
S  green,orange,blue,blue,red, yellow,brown
P green  0.16

Note: This last equation is the probability of picking a jelly
bean that is of the colors green or orange or blue or … .
18
Example
Probability Experiment: Gender of children when a couple has three
children.
Event: Exactly 2 boys among the three children. E = 2 boys in 3 births.
Sample Space: S = {fff, ffm, fmf, fmm, mff, mfm, mmf, mmm}. n = 8.
Event for 2 boys: {fmm, mfm, mmf}. m = 3.
Probability: P(E) = m/n = 3/8 = 0.375.
19
Law of Large Numbers
As the number of trials of a random event, which we denote by N, increases, the
relative frequency of the event E, which we denote by fE , approaches the probability
of the event, P(E). In mathematical language, lim fE  P(E).
N 
20
Two Types of Probability
1. Classical Method: Using mathematics to compute the exact
probability of an event.
2. Empirical Method: Approximate the probability of an event
by using a probability experiment and using the relative
frequency of the event to estimate the exact probability
21
Example (Empirical)
Find the long term number for heads in the probability experiment of flipping a coin.
S = {h,t}.
N = 10. Trial = {t, h, t, h, h, t, t, t, h, t}. fh = 4/10 = 0.4
N = 50. Trial = {h, h, h, h, t, t, h, t, h, t, h, t, t, h, t, t, t, t, h, h, t, t, h, t, t, h, h, h, t, t, h, h, h, h,
t, t, t, t, h, t, t, h, h, h, t, h, h, h, t, t}. fh = 25/50 = 0.5
N = 100. Trial = {h, h, h, t, h, t, t, t, h, t, t, h, h, h, h, h, t, t, t, t, h, t, t, t, t, t, h, h, t, t, h, h, t, h,
h, t, h, t, t, h, t, h, h, h, t, h, t, h, t, h, t, h, t, h, h, t, t, h, t, t, h, h, t, h, h, t, t, t, t, h, t, h, t, t, t, t,
t,
h, h, t, t, t, h, h, t, h, h, t, t, t, h, h, t, h, h, h, t, t, h, h}.
fh= 48/100 = 0.48
As N >> 1, fh approaches 1/2.
22
Caution
Over the “short run”, we do not expect the proportion of a particular
event to the same as the proportion in the “long run.” For example, if I
flip a coin 6 times, then I do not necessary expect that “heads” will
occur exactly 3 times.
23
Examples (classical)
Find the probability of select a heart out from a deck of cards each of which have equal
chance of being selected.
E = {heart}
P(E) = 13/52 = 1/4
Find the probability of selecting a red card from a deck of cards each of which have equal
chance of being selected.
E = {red card}
P(E) = 26/52 = 1/2
Find the probability of selecting a face card from a deck of cards each of which have equal
chance of being selected.
E = {face card}
P(E) = 12/52 = 3/13
24
Example (classical)
The West Meade Golf Shop sells used golf balls. This past Saturday I went to the
shop and was told that the bag of used golf balls coned 35 Titlists, 25 Maxflis, and
20 Top-Flites. I was told that I could reach into the bag and select a ball. What is
the probability that I would select a Titleist?
Solution: Let E = {Titleist}. P(E) = 35/(35+25+20) = 35/80 = 0.4375.
25
Example (classical)
Suppose a probability model has the following sample space:
S = {1,2,3,4,5,6,7,8,9,10}
i.e., there are ten possible outcomes in the probability experiment.
(a) Compute the probability of the event of selecting three numbers with the outcome
of {3,4,7}
(b) Compute the probability of the event of selecting one number with the outcome of
it being an even integer.
Answers: (a) P(E) = 3/10; (b) P(E) = 1/2.
26
Example
Example: Three question quiz.
Problem: Find the possible outcomes for the student taking this three
question quiz.
Sample Space: {CCC,CCI,CIC,CII,ICC,ICI,IIC,III}
Question: Chances of getting all
three questions correct? Note that is
a simple event.
Answer: 1/8 = 0.125
27
Question: What are the chances of getting one out of the three
questions correct?
Answer: If A = {CII,ICI,IIC}, then P(A) = 3/8 = 0.375. Note that A is a
compound event.
28
Probability Estimates from Survey Data
200 Vanderbilt students were surveyed about their main recreational habits: {listening to
music, watch television, playing cards, exercising, other}. The following table summarizes the
frequencies and relative frequencies in the survey.
Recreation
Frequency
Relative Frequency
music
75
3/8
TV
50
1/4
card
15
3/40
exercise
35
7/40
other
25
1/8
The probability that a Vanderbilt student watches TV for his or her main recreation is
approximately 0.25.
29
Tree Diagram
• An nice way of visualizing a sample space with a small number of
outcomes.
• As the number of possible outcomes for each trial increases, the tree
diagram becomes impractical.
30
Tree Example
S  mmm, mmf , mfm, mff , fmm, fmf , ffm, fff 
Simple Event: mmf
Non-simple Event: Two girls and one boy {mff , fmf , ffm}
Probability of two girls and one boy: P(E) 
3
 0.375
8
31
Example
Background: An experimental study by the University of Wisconsin to
determine if Echinacea is an effective treatment for the common cold.

•

Medical Experiment
 Multi-center randomized experiment
 Half of the volunteers are randomly chosen to receive the
herbal remedy and the other half will receive the placebo
 Clinic in Madison, Wisconsin has four volunteers
• Two men: Jamal and Ken
• Two women: Linda and Mary
Probability Experiment
 Randomly pairing the four volunteers
Sample Space to receive the herbal remedy:
{(Jamal, Ken), (Jamal, Linda), (Jamal, Mary), (Ken, Linda), (Ken,
Mary), (Linda, Mary)}
32
Assumption: The six possible outcomes in this sample space for receiving the
Echinacea are equally likely. Hence, the probability that any simple event in the sample
space will occur is 1/6.
Hence, if S = {(Jamal, Ken), (Jamal, Linda), (Jamal, Mary), (Ken, Linda), (Ken, Mary),
(Linda, Mary)}, then the probability of picking Ken and Linda is 1/6.
The probability of picking one man and one women in a simple event is the probability
of picking (Jamal, Linda) or (Jamal, Mary) or (Ken, Linda) or (Ken, Mary) i.e., 4(1/6) =
2/3.
The probability of picking a simple event containing only women is (Linda, Mary) which
has the probability 1/6.
33
Example
Problem: Suppose we roll two dice (#1 & #2) once. What is the probability that the
sum of the numbers on the dice is 7?
Sample Space: Let (x,y) denote the ordered pair where x is the number form die #1
and y is the number from die #2. Then
S = {(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(2,1),(2,2),(2,3),(2,4),(2,5),(2,6),
(3,1),(3,2),(3,3),(3,4),(3,5),(3,6),(4,1),(4,2),(4,3),(4,4),(4,5),(4,6),
(5,1),(5,2),(5,3),(5,4),(5,5),(5,6),(6,1),(6,2),(6,3),(6,4),(6,5),(6,6)}
and N = 36.
Event: E = {(1,6),(2,5),(3,4),(4,3),(5,2),(6,1)}
Probability: P(E) = 6/36 = 1/6.
34
Problem: What is the probability of rolling “box cars?”
Event: E = (6,6) (simple event)
Probability: P(E) = 1/36
35
Example
Problem: What are the chances of a taxpayer being audited by the IRS in 2003?
Solution: The problem can be solved with a contingency table for the audits according
to income level. We can compute the relative frequencies of being audited for each
income level.
36
Sample Space: We define the sample space to be ordered pairs (x,y) where x is
the income range and y is yes (audited) or no (not audited). For x we introduce
some notation:
< $25K
x=1
$25K-$49.999K
x=2
$50K-$99.999K
x=3
$100K <
x = 4.
S = {(1,yes),(1,no),(2,yes),(2,no),(3,yes),(3,no),(4,yes),(4,no)}
37
Income/Audited
Yes
No
Total
< $25K (1)
90
14,011
14,100
$25K-$49.999K (2)
71
30,629
30,700
$50K-$99.999K (3)
69
24,631
24,700
≥ $100K (4)
80
10,620
10,700
Total
310
79,890
80,200
• Probability of being audited (any income): 310/80200 = 0.004 or 0.4%
• Probability of being audited and making an income over $100K: 80/10700 = 0.007 or
0.7%
• Probability of being audited and making an income less than $25K: 90/14100 = 0.006
or 0.6%.
• Probability of not being audited and making an income less than $25K: 14010/14100 =
0.994 or 99.6%.
Remark: In this example we did not use the sample space, but rather a contingency table of
incomes and audits.
38
Subjective Probability
The probability of an event that is obtained on the basis of personal
judgment is called subjective probability. This type of calculation is the
opposite of objective probability (for example, empirical probability
calculations).
Example: What was the probability of landing a man on the moon in
the 1960’s? Any estimate of this probability would be subjective since
we have no prior history of the event.
39
The Addition Rule and Complements
Definition: Two events in a probability experiment are said to be disjoint
if they have no common outcomes. Another term for the same concept is
mutually exclusive i.e., both events cannot happen simultaneously.
Suppose that A and B are events. If they are disjoint, then the probability
and A and B happening is zero i.e., P(A and B) = 0.
Section 5.2
40
Intersection and Union of Sets
We need a little information about working with sets. In particular, we define the notions of
the intersection and union of two sets. Let A and B be two sets: A  a1 ,a2 ,K , aN  and
B  b1 ,b2 ,K ,bN . The union of A and B is the set:A  B  a1 ,a2 ,K ,aN ,b1 ,b2 ,K ,bN . The
intersection of A and B is the set: A  B   :  A,  B i.e., it is the set of objects
that belong to both sets. Two sets are said to be disjoint if they have no common elements
i.e., A  B   (the so-called empty set).
Example : A  {a,1,b,2} B  {a,3,c}  A  B  {a,b,c,1,2,3} A  B  {a}
Example: A  {a,b,c} B  {d,e}  A  B  {a,b,c,d,e} A  B  . Hence, A and B
are disjoint sets.
41
Venn Diagrams
There is a graphical way of looking at the intersection and union of sets.
They are called Venn Diagrams.
http://kt2.exp.sis.pitt.edu:8080/venn/andor.jsp
42
Picture of Intersection/Union
43
Complement of a Set
44
Sets and Events
Suppose that A is a set of events (possibly a single event in the
sample space) and B is another set of events (again, possibly a single
event) in a common sample space. We consider A and B to be
subsets of the sample space. We can perform the set operations
intersection and union. For example, the union of A and B is set of
events that arise in A or B. The intersection of A and B is the set of
events that is common to both events. Similarly, we can talk about the
complement of Ac i.e., the set of events that are not in A, but are in the
sample space.
45
Union/Intersection of Events
• The union of two events A and B is the new event consisting of events that
are either in A or B.
• The intersection of two events A and B is the new event consisting of
events that are in A and B.
46
Sets, Events and Probabilities
Suppose that we want to calculate the probability that a particular event in the same
space will occur. For example in our three question pop quiz illustration, what is the
probability that a student will have two and only two correct answers to the three
questions. As we already know, the sample space can be viewed as a set and
events are subsets of the sample space. That is, we have the event: A =
{CCI,CIC,ICC}. Notice that the event B = {CII,ICI,IIC} is a disjoint event from the
event A. In fact, A is disjoint from its complement:
Ac = {CCC,CII,CIC,IIC,III}.
The event, C, of getting one or two correct answers on the quiz is given by the set C
which is the union of A and B i.e.,
C = {CII,ICI,IIC,CCI,CIC,ICC}.
We would like to determine the probability of
the event C by using information about the
probabilities of events A and B.
47
Notation
Let A and B be events (sets in the sample space). We introduce the notation:
A  B  A or B and A  B  A and B.
That is we associate the mathematical symbol  with the word or and the symbol 
with the word and. In particular, P(A  B)  P(A or B) and P(A  B)  P(A and B).
48
Disjoint Events
Let A and B be two disjoint events i.e., A  B    {}. Then P(A and B)  0.
That is, the probability of two disjoint events is zero which makes senses since
they do not share any simple events. Disjoint events are also called mutually exclusive
events.
49
Probability of A or B for Disjoint Events
Theorem : Let A and B be disjoint (i.e., mutally exclusive) events. Then
P(A or B)  P(A)  P(B). Furthermore, if A1 , A2 ,..., Ak are disjoint events,
k
then P(A1 or A2 or ... or Ak )   P(Ai ).
i 1
50
Example
Consider a deck of 52 cards (spades, hearts, clubs and diamonds;
2,3,…,10,J,Q,K,A). Consider the problem of drawing one card from this deck.
The sample space has 52 simple events (drawing one of the cards). We
characterize the sample space as ordered pairs: (count,suit) e.g., 10 of
diamonds is (10,D).
Question: What is the probability of drawing the ace of diamonds?
Answer: P(A,D) = 1/52 = 0.019 .
Question: What is the probability of a king?
Answer: P(king) = P(K,H) + P(K,D) + P(K,C) + P(K,S) = 4/52 = 1/13 = 0.077
Note that (K,H), (K,D), (K,C) and (K,S) are mutually exclusive events.
51
Question: What is the probability of drawing a heart?
Answer: P(heart) = P(A,H) + P(2,H) +…+ P(K,H) = 13/52 = 1/4 = 0.25
Note that (A,H),(2,H),…,(K,H) are mutually exclusive events.
Question: What is the probability of drawing a king or queen?
Answer: P(king or queen) = P(king) + P(queen)
P(king or queen) = 4/52 + 4/52 = 8/52 = 2/13 = 0.154
52
Probability of A or B for any two Events
Theorem : Let A and B be any events. Then P(A or B)  P(A)  P(B)  P(A and B).
Furthermore, if A1, A2 ,..., Ak are any events, then
k
P(A1 or A2 or ... or Ak )   P(Ai )  P(A1 and A2 and ... and Ak ).
i 1
53
Example
Consider a deck of 52 cards (spades, hearts, clubs and diamonds;
2,3,…,10,J,Q,K,A). Consider the problem of drawing one card from this deck.
The sample space has 52 simple events (drawing one of the cards).
Question: What is the probability of drawing the ace or a diamond?
Answer: Let A = ace and B = diamond. These are not disjoint events.
P(A) = 4/52 = 1/13, P(B) = 13/52
P(A or B) = P(A) + P(B) - P(A and B)
P(A and B) = P(ace and diamond) = 1/52
Hence, P(ace or diamond) = 1/13 + 13/52 - 1/52 = (4+13-1)/52 = 16/52 = 4/13
54
Example
Two hundred and fifty Vanderbilt students were analyzed for
the IQ and their ability to do a certain mathematical puzzle.
The results are summarized in the following contingency table.
Puzzle/IQ
Average
(90-120)
High
(>120 )
Couldn’t do
Puzzle
75
30
Could do
Puzzle
20
125
55
Let A = high IQ. Then P(A) = (30+125)/250 = 155/250 = 0.62
Let B = could. Then P(B) = (20+125)/250 = 145/250 = 0.58
Question: What is the probability that a student has a high IQ or could do the
puzzle?
Answer: P(A or B) = P(A) + P(B) - P(A and B) = 0.62 + 0.58 - ?
From the table, P(A and B) = 125/250 = 0.50
Hence, P(A or B) = 0.62 + 0.58 - 0.50 = 0.70
Puzzle/IQ
Average
(90-120)
High
(>120 )
Total
Couldn’t do
Puzzle
75
30
105
Could do
Puzzle
20
125
145
Total
95
155
250
56
Another Approach
Let us set the situation as ordered pairs:
S = {(couldn’t,average),(couldn’t,high),(could,average),(could high)}.
We would like to find the probable of choosing the ordered pair: (could, high). There
are 250 order pairs. We want to calculate the probability of choosing ordered pairs
{(could,---),(---,high)}. Then
P[(could,---)] = (20+125)/250 = 145/250 = 0.58
P[(---,high)] = (30+125)/250 = 155/250 = 0.62
P[(could,high)] = 125/145 = 0.50
Then
P[(could,---) or (---,high)] = 0.58 + 0.62 - 0.50 = 0.70
57
Example (gender/marriage)
Consider the following contingency table for present marital status and
gender of people in the U.S. over the age of 18 in 2003.
Males
Females
Total
Never Married
28.6
23.3
51.9
Married
62.1
62.8
124.9
Widowed
2.7
11.3
14.0
Divorced
9.0
12.7
21.7
Total
102.4
110.1
212.5
We want to calculate the probability that a person in the U.S. over the
age of 18 has some particular characteristic. We use the relative
frequencies from this table to compute the probabilities.
58
Question: What is the probability that a person in this census is a female?
Answer: P(female) = 110.1/212.5 = 0.518
Question: What is the probability that a person in this census is widowed?
Answer: P(widowed) = 14.0/212.5 = 0.066
Question: What is the probability that a person in this census is a widowed or divorced?
Answer: P(widowed or divorced) = P(widowed) + P(divorced) - P(widowed and divorced)
P(widowed or divorced) = 14.0/212.5 + 21.7/212.5 - 0 = 0.168
Males
Females
Total
Never Married
28.6
23.3
51.9
Married
62.1
62.8
124.9
Widowed
2.7
11.3
14.0
Divorced
9.0
12.7
21.7
Total
102.4
110.1
212.5
59
Question: What is the probability that a person in this census is a married or female?
Answer: P(married or female) = P(married) + P(female) - P(married and female)
P(married or female) = 124.9/212.5 + 110.1/212.5 - 62.8/215.5 = 172.2/212.5 = 0.799
Question: What is the probability that a person in this census is a male or divorced?
Answer: P(male or divorced) = P(male) + P(divorced) - P(male and divorced)
P(male or divorced) = 102.4/212.5 + 21.7/212.5 - 9.0/212.5 = 115.1/212.5 = 0.542
Males
Females
Total
Never Married
28.6
23.3
51.9
Married
62.1
62.8
124.9
Widowed
2.7
11.3
14.0
Divorced
9.0
12.7
21.7
Total
102.4
110.1
212.5
60
Complement of an Event
Definition : Suppose E is an event in a sample space S. The complement of
the event E, denoted E c , is the set of all events in S that do not include E.
Example: Three question quiz.
S = {CCC,CCI,CIC,CII,ICC,ICI,IIC,III}
Let E be the event of having one correct answer:
E = {CII,ICI,IIC}.
Then
Ec = {CCC,CCI,CIC,ICC,III}.
61
Probability of the Complement of an
Event
Theorem : If P(E) is the probability of an event E and P(E c ) is the
probability of the complement of E, then P(E c )  1  P(E).
Example: Three question quiz. Find the probability of not having
one, and only one, correct answer on the quiz.
Let E be the event of having one correct answer, E = {CII,ICI,IIC},
then Ec = {CCC,CCI,CIC,ICC,III}. If each event in the sample space
is equally likely, then P(E) = 3/8 and P(Ec) = 5/8 = 1 - 3/8.
62
Example
The following table shows the relative frequencies of the size of farms in the U.S.
Size in Acres (x)
Relative Frequency
x < 10
0.084
10 ≤ x < 50
0.265
50 ≤ x < 100
0.161
100 ≤ x < 180
0.149
180 ≤ x < 260
0.077
260 ≤ x < 500
0.106
500 ≤ x < 1000
0.076
1000 ≤ x < 2000
0.047
2000 ≤ x
0.035
(1) What is the probability that a farm in the
U.S. will be between 100 and 500 acres?
(2) What is the probability that a farm will be
greater than or equal to 10 acres?
(1) P(100 ≤ x <500) = P(100 ≤ x <180) + P(100 ≤ x <180) + P(180 ≤ x <260) + P(260 ≤ x <500)
= 0.149 + 0.077 + 0.106 = 0.332
(2) P(x ≥ 10) = 1 - P(x < 10) = 1 - 0.084 = 0.916
63
Example
Roulette consists of a wheel with 38 slots, numbered 0,1,2,…,36,00 i.e., 38
slots. The odd-numbered slots are red and the even-numbered slots are
black. The slots, 0 & 00, are colored green. The wheel is spun and a
metal ball circles the wheel until it lands in a numbered slot.
(a) What is the probability that the ball lands on a green or red slot?
(b) What is the probability that it does not land in a green slot?
Let A = event of landing on red slot, B = event of landing on green slot.
Then P(A) = 18/38 = 9/19 and P(B) = 2/38 = 1/19.
(a) Then P(A or B) = P(A) + P(B) - P(A and B) = 9/19 + 1/19 - 0 = 10/19
(b) P(Bc) = 1 - P(B) = 1 - 1/19 = 18/19.
64
Independence and the
Multiplication Rule
Definition: Two events A and B are called independent events if the fact
that A occurs does not affect the probability of B occurring and vice-versa.
When the probability of A effects the probability of B, then we say that the
events are dependent.
Examples: Independent
(a)
Rolling a die and getting a 4 and then rolling it a second time and getting a 2.
(b)
Drawing a card from a deck and get a 10 of diamonds and then replacing the
card, shuffling and then drawing a 10 of diamonds again.
(c)
Suppose a patient at the Vanderbilt Hospital is selected at random from a group
of patients. Let A be the event that the patient has atherosclerosis. Let B be
the event that the patient is a smoker. A and B are not independent i.e., they
are dependent since smoker has a high incidence of heart disease.
Section 5.3
65
Dependent Events
Examples: Dependent Events
(a) Being a lifeguard and getting a suntan.
(b) Parking in a no-parking zone and getting a parking ticket.
(c) Being a Vanderbilt student and getting a good education.
66
Remark
Disjoint events and independent events are different concepts. Two
events are disjoint if they one event occurs, then the other one will
not happen. Two events are independent if the probability of one
event does not event the probability of the other event. Hence, if two
events are disjoint, then they cannot be independent
67
Probability of A and B for
Independent Events
Let A and B be two independent events in a sample space. Then
P(A and B) = P(A)P(B).
Note: If A and B are disjoint events, then P(A and B) = 0.
68
Example
Suppose we flip a coin twice in a row. What is the probability that we will
see two heads?
Sample Space: S = {h,t}.
Note that the events are independent.
Let A = {h} and B = {h}. Then P(A and B) = P(A)P(B) = (0.5)(0.5) = 0.25 .
Question: Is this the same as asking the probability of the event: given the
first flip is head, then the second flip is also a head?
69
Example
Suppose we have a two question pop quiz. We perform a probability experiment
and find the following data about the outcome of taking the quiz.
Outcomes
Probabilities
II
IC
CI
CC
0.26
0.11
0.05
0.58
Suppose that we consider the events (answering the questions) are not
independent. For example, if you answer the first question correct, the
probability of answer the second question correct is not necessarily the same as
if you had answered the first question incorrectly.
Let A = event that the first question is answered correctly, irregardless of the
answer to the second question.
Let B = event that the second question is answered correctly, irregardless of the
answer to the first question.
70
Outcomes
Probabilities
II
IC
CI
CC
0.26
0.11
0.05
0.58
P(A) = P(CI or CC) = P(CI) + P(CC) = 0.05 + 0.58 = 0.63
P(B) = P(IC or CC) = P(IC) + P(CC) = 0.11 + 0.58 = 0.69
P(A and B) = P(CC) = 0.58
If A and B were independent, then P(A and B) = P(A) P(B) = (0.63)(0.69) = 0.43
Notice that we computed different values for P(A and B), depending on the whether we
assume independence or not of the two events.
71
Example
The E.P.T. Pregnancy Test states that the test is “99% accurate in detecting typical
pregnancy hormone levels.” Suppose that we randomly select 12 pregnant women.
(a) What is the probability that all 12 of them will test positively?
(b) What is the probability that at least one will not test positively?
Let A = event that one will test positively. Hence, P(A) = 0.99
(a) P(all test positively) = P(A and A and … and A) = P(A)12 = (0.99)12 = 0.886385
(b) P(at least one test negatively) = 1 - P(all test positively) = 0.113615
72
Caution
Don’t assume that events are independent unless
you have given this assumption careful thought and
it seems plausible.
73
Summary
Summary of the Laws of Probability
Let A and B be events (simple or compound) in a sample space: S  {E1 , E2 ,..., En }
1. 0  P(A)  1
n
2.
 P(E )  1
i
i 1
3. If A and B are disjoint events, then P(A or B)  P(A)  P(B).
4. If A and B are any events, then P(A or B)  P(A)  P(B)  P(A and B).
5. P(A c )  1  P(A).
6. If A and B are independent events, then P(A and B)  P(A)P(B).
7. If A and B are disjoint events, then P(A and B)  0.
74
Conditional Probability and the
General Multiplication Rule
Suppose that we have two events, A and B.
(a) P(A or B) = P(A) + P(B) - P(A and B)
(b) If they are independent events, then P(A and B) = P(A)P(B).
Recall that two events are independent if the the probability of A is independent
of B and vice-versa.
Question: What happens if the events are not independent i.e., they are
dependent. That is, the probability of one dependents on the probability of the
other.
Definition: The symbol P(A|B) means the probability of an event A given that an
event B has occurred. This probability is called a conditional probability.
Section 5.4
75
Conditional Events
P(E|F): the probability of E
given F
In other words, the
probability of E, given the F
has already happened.
76
Condition Probability Formula
Let A and B be events. T he conditional probabilityA,ofgiven B, is given by
P(A and B)
P(A and B)
P(A | B) 
. Furt hermore,P(B | A) 
.
P(B)
P(A)

77
Example
Twenty-five percent of Vanderbilt professors in the age range 50-60
years old have hypertension. In this same group, five percent also
have diabetes. Given that an individual in this age group has
hypertension, what is the probability that he or she will also have
diabetes?
A  event of having hypertension
B  event of having diabetes
P(A)  0.25
P(A and B)  0.05
 P(B | A) 
P(A and B) 0.05 1

  0.20
P(A)
0.25 5
78
Example
Here we consider two dependent events with the events being income level and the other being audited.
Contingency Table for IRS filers.
Probabilities of audit for different income classes:
Income/Audited
Yes
No
Total
< $25K
90
14,011
14,100
$25K-$49.999K
71
30,629
30,700
P(yes and <$25K) = 90/80200 = 0.0011221
$50K-$100K
69
24,631
24,700
P(no and <$25K) = 14011/80200 = 0.1747007
$100K <
80
10,620
10,700
P(yes and $25K-50K) = 71/80200 = 0.0008862
Total
310
79,890
80,200
P(yes) = 310/80200 = 0.0038653 (overall)
Probability Table for IRS filers.
S = {(<$25K,yes),(<$25K,no),($25-50K,yes), ($25-50K,no),($50100K,yes),($50-100K,no),($100K<,yes),($100K<,no)}.
Income/Audited
Yes
No
Total
< $25K
0.0011221
0.1747007
0.1758104
P[(<$25K,yes)] = 0.0011 P[($100K<,no)] = 0.1324
$25K-$49.999K
0.0008852
0.3819077
0.3827930
The sum of events in the sample space:
0.0011+0.1747+0.0009+0.3819+0.0009+0.3071+0.001+0.1324 = 1.0
$50K-$100K
0.0008603
0.3071119
0.3079800
$100K <
0.0009975
0.1324189
0.1334164
Total
0.0038653
0.9961346
1.000
79
Question: Let A be the event {audit=yes} and let B be the event {income>$100K}. Find
P(A|B) i.e., the probability that an individual will be audited, given that his or her income is
greater than $100K.
Answer: We note that P(A and B) = P[(yes,$100K<)] = 0.0009975. Furthermore, the
probability that taxpayer has an income greater than $100K is P($100K<) = 0.1344164.
Therefore,
P(yes | $100K<) = P[(yes,$100K<)] /P($100K<) = 0.0009975/0.1334164 = 0.0074766
P(A | B) 
P(A and B)
P(B)
Probability Table for IRS filers (from previous page)
Income/Audited
Yes
No
Total
< $25K
0.0011221
0.1747007
0.1758104
$25K-$49.999K
0.0008852
0.3819077
0.3827930
$50K-$100K
0.0008603
0.3071119
0.3079800
$100K <
0.0009975
0.1324189
0.1334164
Total
0.0038653
0.9961346
1.0
80
Probability Table for IRS filers (from previous page)
Income/Audited
Yes
No
Total
P(A and B)
< $25K
0.0011221
0.1747007
0.1758104
B  income
$25K-$49.999K
0.0008852
0.3819077
0.3827930
$50K-$100K
0.0008603
0.3071119
0.3079800
$100K <
0.0009975
0.1324189
0.1334164
Total
0.0039
0.9961
1.0
With some rounding, we can form a conditional probability table.
P(A | B) 
A  audit
P(A and B)
P(B)
Income/Audited
Yes
No
Total
Given income is less than $25K,
< $25K
0.006
0.994
1.000
0.0011221/0.1758104 = 0.0062
$25K-$49.999K
0.002
0.998
1.000
0.1747007/0.1758104 = 0.9937
$50K-$100K
0.003
0.997
1.000
$100K <
0.007
0.993
1.000
Given income is greater than $100K,
0.0009975/0.1334164 = 0.0074
0.1324189/0.1334164 = 0.9925
81
Remark
We cam use the contingency table directly i.e., without finding the conditional proportions.
Income/Audited
Yes
No
Total
< $25K
90
14,011
14,100
$25K-$49.999K
71
30,629
30,700
$50K-$100K
69
24,631
24,700
$100K <
80
10,620
10,700
Total
310
79,890
80,200
P(yes | $100K ) 
P(yes and $100K )
80 / 80200
80


 0.0074766
P($100K )
10700 / 80200 10700
82
Example
A survey asked 100 people their opinions about gender and combat in the military.
Here is the results of this survey.
Gender/Combat
Yes (y)
No (n)
Total
Male (M)
32
18
50
Female (F)
8
42
50
Total
40
60
100
P(y and F) = 8/100
P(n and M) =18/100
Question: What is the probability the answer is yes, given this it is a female?
Answer: P(y|F) = P(y and F)/P(F) = (8/100)/(50/100) = 8/50 = 4/25 = 0.16
Question: What is the probability the subject is male, given the answer it is no?
Answer: P(M|n) = P(M and n)/P(n) = (18/100)/(60/100) = 18/60 = 3/10 = 0.30
83
Example
Problem: Analyzing the Triple Blood Test for Down Syndrome. Take blood from a
pregnant woman and perform the biochemical analysis (see if there is an
extra copy of chromosome 21). The test can be either positive or negative.
Unfortunately, it is not always accurate.
Results of Test:
1.
True Positive: Test is positive and baby has extra chromosome.
2.
False Positive: Test is positive and baby does not have extra chromosome.
3.
True Negative: Test is negative and baby does not have extra chromosome.
4.
False Negative: Test is negative and baby has extra chromosome.
84
Study: 5,282 women of age 35 or older took the Triple Blood Test and after they
had their child, the accuracy of the test was analyzed. Here is a contingency
table for the study.
Syndrome/Test
Positive (p)
Negative (n)
Total
Yes (D)
48
6
54
No (Dc)
1307
3921
5228
Total
1355
3927
5282
P(p) = 1355/5282 = 0.2565
P(n) = 3927/5282 = 0.7434
P(D and p) = 48/5282 = 0.0091
P(D and n) = 6/5282 = 0.0011
P(D|p) = P(D and p)/P(p) = 0.0091/0.2565 = 0.035
P(D|n) = P(D and n)/P(n) = 0.0011/0.7434 = 0.0015
85
Probability Table: P(A), P(B), P(A and B)
Syndrome/Test
Positive (p)
Negative (n)
Total
Yes (D)
0.009
0.001
0.010
No (Dc)
0.247
0.742
0.990
Total
0.257
0.743
1.0
Conclusion: If the sample of 5,282 is representative of all women who take the test,
then for women who test positively, only approximately 4% [P(D|p)] of the fetuses have
Down Syndrome. However, there is 0.9% chance of the fetus having the disease and
0.1% chance of a negative test still having the syndrome [P(D|n)]. Hence, a negative
test is a good indicator of the disease not be present.
86
Example
The following table show the results of a study of 137,243 men in the U.S. The study investigated
the association between cigar smoking and death from cancer.
(a) What is the probability that a randomly select man form the study who died from cancer was a
former cigar smoker?
(b) What is the probability that a randomly select man who was a former cigar smoker died from
cancer?
Died from
Cancer
Did not
die from
cancer
Total
Never
smoked
cigars
782
120,747
121,529
Former
cigar
smoker
91
7,757
7,848
Current
cigar
smoker
141
7,725
7,866
Total
1,014
136,229
137,243
Let A  died from cancer and B  former cigar smoker
P(A and B)
91 / 137243
91
(a) P(B | A) 


 0.08974
P(A)
1014 / 137243 1014
P(A and B)
91 / 137243
91
(b) P(A | B) 


 0.01159
P(B)
7848 / 137243 7848
87
Counting Techniques
In order to calculate probabilities using the classical method, one
must be able to count how many times a particular event will occur.
For sample spaces that are small, this is usually an easy task.
However, for sample spaces that are large, we will need some
“counting formulas.”
Section 5.5
88
Multiplication Rule of Counting
This rule is often useful in counting outcomes (events) that have a
tree structure. For example, how many different ways can the births
of three children occur?
Count of Final Events  2  2  2  2 3  8
89
Generalization
Suppose that we have 3 branch points with the first branch point 2
possibilities, then 3 possibilies, then 2 possibilities.
Count of Final Events  2  3  2  12
Theorem : Suppose there are r1 possibilities at first step, then r2 possibilities at second
step, then r3 possibilities at third step, then r4 possibilities at fourth step, ..., then rk at the
kth step. The total number of outcomes is then r1  r2  r3 L rk .
90
Example
Problem: A girl has five blouses and three skirts. How many different outfits can
she put together?
Answer: (5)(3) = 15
Problem: A girl has five blouses and three skirts and 4 pairs of shoes. How
many different outfits (including the shoes) can she put together?
Answer: (5)(3)(4) = 60
91
Factorials and Sequences
Suppose that we have n elements in a set: S  a1 , a2 ,..., an . Suppose
that construct sequences using elements of S: b1 ,b2 ,...,bn  where the first element, b1 ,
can be any of the elements of S, the second element, b2 , can be any of the elements of
S  b1 , the third element, b3 , be any element of S  b1 ,b2  and so forth.
How many possible sequences are there?
Answer : Using the Multiplication Rule: n  (n  1)  (n  2)L (3)(2)(1)  n!
1. How many sequences can be constructed in the above way from the set S = {a,b,c,d}.
Answer: 4! = (1)(2)(3)(4) = 24
2. A salesman must travel to five cities to promote his products. How many different trips
are possible if any route between two cities is possible?
Answer: Sample trip: {city1, city2, city3, city4, city5}. Hence, 5! = 120.
92
Permutations
Suppose we have n distinct elements in a set: S  a1 , a2 ,..., an . If we
choose any r distinct elements in this set, then how many different arrangements
of the r elements can be constructed where order is important (e.g., A,B,C is different
from C,A,B or B,A,C).
Answer :
Note :
n Pn 1 
n
n
Pr 
Pn 
n!
n  r !
n!
n! n!
   n!
n  n ! 0! 1
n!
n! n!
   n!
n  (n  1)! 1! 1
93
Example
How many different ways (permutations) can four letters, {a,b,c,d} be arranged,
taking two a time, e.g., {a,b}, {a,d}, {c,d}, etc?
4 P2 
4!
24

 12
(4  2)! 2
{a,b},{b,a},{a,c},{c,a},{a,d},{d,a},{b,c},{c,b},{c,d},{d,c},{b,d},{d,b} --- 12 arrangements
94
Permutations with Duplicates
Suppose we have n elements in a set: S  a1 , a2 ,..., aN  where some or all of the elements can
be repeated. Suppose that there are k distinct elements in S. Let k1 be the number of repeats of a1 ,
k2 be the number of repeats of a2 ,..., and km be the number of repeats of am . If we choose all n
elements in this set, then how many different arrangements can be constructed where order is
important (e.g., A, B,C  is different from C, A, B or B, A,C ). Note that k1  k2  ...  km  N.
Answer : P%
n 
n!
k1 !k2 !L kn !
Example : How many ways can 7 girls and 2 boys be arranged in a sequence e.g., B,G, B,G,G,G,G,G,G?
Answer : P%
9 
9!
(7!)(8)(9) 72


 36
7!2! 7!2! 2
95
Combinations
Suppose we have n distinct elements in a set: S  a1 , a2 ,..., an . If we
choose any r distinct elements in this set, then how many different arrangements
of the r elements can be constructed where order is not important (e.g., A, B,C  is treated
the same as C, A, B or B, A,C ).
Answer : n Cr 
Note : n Cn 
n
Cn1 
n!
n  r ! r!
n!
1
n  n ! n!
n!
n!
n

n  (n  1)! (n  1)! (1!)(n  1)!
96
Examples
Example : Find all of the combinations of five distinct objects, a,b, c, d, e,
taken two at a time.
5!
5!
45
Answer : 5 C2 


 10
(5  2)! 2! 3! 2!
2
Example : In the U.S. Senate, there are 21 members ofn the Committee on Banking, Housing and
Urban Affairs. Nine of these 21 members are selected to be on the Subcommittee on Economic
Policy. How many different committee structures are possible for this subcommittee?
21!
21!
9 10 11L 21
Answer : 21 C9 


 293, 930
(21  9)! 9! 12! 9!
1 2  3L 9
97
Examples
Example : In the Pick 5 lottery game, one must pick 5 balls from a group of 39 balls that are numbered 1-39.
The order of the 5 balls is not important. What is the probability of winning this game?
Answer : P(winning) 
1

combinations of 5 balls
1

39 C5
1
5! 34!
5!
1



 1.73684  10 6
39!
39!
35  36  37  38  39 575, 757
5! (39  5)!
Example : How many ways can two particular horses finish in a 10-horse race?
Answer : Notice that order is important in counting the different ways. Hence,
10
P2 
10!
10!

 9 10  90.
(10  2)! 8!
98
Example
Example : How many different 10-letter words (real or imaginery) can be formed from the letter in the word: STATISTICS?
Answer : Each word must have 3 S's, 3 T's, 2 I's, 1 A and 1 C. Each word will have 10 letters. If we neglected the counting of each
letter, then there are 10! arrangements of the ten letters. However, we have restrictions on the number of times each letter appears.
Therefore, we use muliplication rule:
(number of ways for S)(number of ways for T)(number of ways for I)(number of ways for A)(number of ways for C)

10!

7!

4!

2!

1!

= 10 C3 7 C3 4 C2 2 C1 1 C1   








 (3!)(10  3)!  (3)!(7  3)!  (2)!(4  2)!  (1)!(2  1)!  (1)!(1  1)!
 10!   7!   4!   2!   1! 
10!
10!



 50, 400









 (3!)(7)!  (3)!(4)!  (2)!(2)!  (1)!(1)!  (1)!(0)! 3!3!2!1!0! 3!3!2!1!1!
99

Chapter 1: Data Collection

Transcript Chapter 1: Data Collection

Directory