Lecture 6 - Alex Braunstein's Blog

Download Report

Transcript Lecture 6 - Alex Braunstein's Blog

Statistics 111 - Lecture 6
Probability
Introduction to Probability,
Conditional Probability and
Random Variables
June 3, 2008
Stat 111 - Lecture 6 - Probability
1
Administrative Note
• Homework 2 due Monday, June 8th
– Look at the questions now!
• Prepare to have your minds blown
today
June 3, 2008
Stat 111 - Lecture 6 - Probability
2
Course Overview
Collecting Data
Exploring Data
Probability Intro.
Inference
Comparing Variables
Means
June 2, 2008
Proportions
Relationships between Variables
Regression
Stat 111 - Lecture 6 - Introduction
Contingency Tables
3
Why do we need Probability?
• We have several graphical and numerical
statistics for summarizing our data
• We want to make probability statements
about the significance of our statistics
• Eg. In Stat111, mean(height) = 66.7 inches
• What is the chance that the true height of Penn
students is between 60 and 70 inches?
• Eg. r = -0.22 for draft order and birthday
• What is the chance that the true correlation is
significantly different from zero?
June 3, 2008
Stat 111 - Lecture 6 - Probability
4
Deterministic vs. Random Processes
• In deterministic processes, the outcome can be
predicted exactly in advance
• Eg. Force = mass x acceleration. If we are given
values for mass and acceleration, we exactly know
the value of force
• In random processes, the outcome is not
known exactly, but we can still describe the
probability distribution of possible outcomes
• Eg. 10 coin tosses: we don’t know exactly how
many heads we will get, but we can calculate the
probability of getting a certain number of heads
June 3, 2008
Stat 111 - Lecture 6 - Probability
5
Events
•
An event is an outcome or a set of outcomes of
a random process
Example: Tossing a coin three times
Event A = getting exactly two heads = {HTH, HHT, THH}
Example: Picking real number X between 1 and 20
Event A = chosen number is at most 8.23 = {X ≤ 8.23}
Example: Tossing a fair dice
Event A = result is an even number = {2, 4, 6}
•
•
Notation: P(A) = Probability of event A
Probability Rule 1:
0 ≤ P(A) ≤ 1 for any event A
June 3, 2008
Stat 111 - Lecture 6 - Probability
6
Sample Space
• The sample space S of a random process is
the set of all possible outcomes
Example: one coin toss
S = {H,T}
Example: three coin tosses
S = {HHH, HTH, HHT, TTT, HTT, THT, TTH, THH}
Example: roll a six-sided dice
S = {1, 2, 3, 4, 5, 6}
Example: Pick a real number X between 1 and 20
S = all real numbers between 1 and 20
• Probability Rule 2: The probability of the
whole sample space is 1
P(S) = 1
June 3, 2008
Stat 111 - Lecture 6 - Probability
7
Combinations of Events
• The complement Ac of an event A is the event that A
does not occur
• Probability Rule 3:
P(Ac) = 1 - P(A)
• The union of two events A and B is the event that
either A or B or both occurs
• The intersection of two events A and B is the event
that both A and B occur
Event A
June 3, 2008
Complement of A
Union of A and B
Stat 111 - Lecture 6 - Probability
Intersection of A and B
8
Disjoint Events
• Two events are called disjoint if they can not
happen at the same time
• Events A and B are disjoint means that the
intersection of A and B is zero
• Example: coin is tossed twice
• S = {HH,TH,HT,TT}
• Events A={HH} and B={TT} are disjoint
• Events A={HH,HT} and B = {HH} are not disjoint
• Probability Rule 4: If A and B are disjoint
events then
P(A or B) = P(A) + P(B)
June 3, 2008
Stat 111 - Lecture 6 - Probability
9
Independent events
• Events A and B are independent if knowing that A
occurs does not affect the probability that B occurs
• Example: tossing two coins
Event A = first coin is a head
Event B = second coin is a head
Independent
• Disjoint events cannot be independent!
• If A and B can not occur together (disjoint), then knowing that
A occurs does change probability that B occurs
• Probability Rule 5: If A and B are independent
P(A and B) = P(A) x P(B)
multiplication rule for independent events
June 3, 2008
Stat 111 - Lecture 6 - Probability
10
Equally Likely Outcomes Rule
• If all possible outcomes from a random process
have the same probability, then
• P(A) = (# of outcomes in A)/(# of outcomes in S)
• Example: One Dice Tossed
P(even number) = |2,4,6| / |1,2,3,4,5,6|
• Note: equal outcomes rule only works if the
number of outcomes is “countable”
• Eg. of an uncountable process is sampling any fraction between
0 and 1. Impossible to count all possible fractions !
June 3, 2008
Stat 111 - Lecture 6 - Probability
11
Combining Probability Rules Together
• Initial screening for HIV in the blood first uses
an enzyme immunoassay test (EIA)
• Even if an individual is HIV-negative, EIA has
probability of 0.006 of giving a positive result
• Suppose 100 people are tested who are all
HIV-negative. What is probability that at
least one will show positive on the test?
• First, use complement rule:
P(at least one positive) = 1 - P(all negative)
June 3, 2008
Stat 111 - Lecture 6 - Probability
12
Combining Probability Rules Together
• Now, we assume that each individual is
independent and use the multiplication rule for
independent events:
P(all negative) = P(test 1 negative) ×…× P(test 100 negative)
• P(test negative) = 1 - P(test positive) = 0.994
P(all negative) = 0.994 ×…× 0.994 = (0.994)100
• So, we finally we have
P(at least one positive) =1− (0.994)100 = 0.452
June 3, 2008
Stat 111 - Lecture 6 - Probability
13
Curse of the Bambino:
Boston Red Sox traded Babe
Ruth after 1918 and did not
win a World Series again until
2004 (86 years later)
• What are the chances that a team will go 86
years without winning a world series?
• Simplifying assumptions:
• Baseball has always had 30 teams
• Each team has equal chance of winning each year
June 3, 2008
Stat 111 - Lecture 6 - Probability
14
Curse of the Bambino
• With 30 teams that are “equally likely” to win in a year, we
have
P(no WS in a year) = 29/30 = 0.97
• If we also assume that each year is independent, we can
use multiplication rule
P(no WS in 86 years)
= P(no WS in year 1) x… xP(no WS in year 86)
= (0.97) x… x (0.97)
= (0.97)86 = 0.05 (only 5% chance!)
June 3, 2008
Stat 111 - Lecture 6 - Probability
15
Break
June 3, 2008
Stat 111 - Lecture 6 - Probability
16
Outline
• Moore, McCabe and Craig: Section
4.3,4.5
• Conditional Probability
• Discrete Random Variables
• Continuous Random Variables
• Properties of Random Variables
• Means of Random Variables
• Variances of Random Variables
June 4, 2008
Stat 111 - Lecture 6 - Random
Variables
17
Conditional Probabilities
• The notion of conditional probability can be
found in many different types of problems
• Eg. imperfect diagnostic test for a disease
Disease +
Disease -
Total
Test +
30
10
40
Test -
10
50
60
Total
40
60
100
• What is probability that a person has the
disease? Answer: 40/100 = 0.4
• What is the probability that a person has the
disease given that they tested positive?
More Complicated !
June 4, 2008
Stat 111 - Lecture 6 - Random
Variables
18
Definition: Conditional Probability
• Let A and B be two events in sample space
• The conditional probability that event B occurs
given that event A has occurred is:
P(A|B) = P(A and B) / P(B)
• Eg. probability of disease given test positive
P(disease +| test +) = P(disease + and test +) / P(test +) = (30/100)/(40/100) =.75
June 4, 2008
Stat 111 - Lecture 6 - Random
Variables
19
Independent vs. Non-independent Events
• If A and B are independent, then
P(A and B) = P(A) x P(B)
which means that conditional probability is:
P(B | A) = P(A and B) / P(A) = P(A)P(B)/P(A) = P(B)
• We have a more general multiplication rule for
events that are not independent:
P(A and B) = P(B | A) × P(A)
June 4, 2008
Stat 111 - Lecture 6 - Random
Variables
20
Random variables
• A random variable is a numerical outcome of
a random process or random event
• Example: three tosses of a coin
• S = {HHH,THH,HTH,HHT,HTT,THT,TTH,TTT}
• Random variable X = number of observed tails
• Possible values for X = {0,1, 2, 3}
• Why do we need random variables?
• We use them as a model for our observed data
June 4, 2008
Stat 111 - Lecture 6 - Random
Variables
21
Discrete Random Variables
• A discrete random variable has a finite or
countable number of distinct values
• Discrete random variables can be summarized
by listing all values along with the probabilities
• Called a probability distribution
• Example: number of members in US families
X
2
3
4
5
6
7
P(X)
0.413
0.236
0.211
0.090
0.032
0.018
June 4, 2008
Stat 111 - Lecture 6 - Random
Variables
22
Another Example
• Random variable X = the sum of two dice
• X takes on values from 2 to 12
• Use “equally-likely outcomes” rule to
calculate the probability distribution:
X
2
3
4
5
6
7
8
9
10
11
12
# of
Outco
mes
1
2
3
4
5
6
5
4
3
2
1
P(X)
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36
• If discrete r.v. takes on many values, it is
better to use a probability histogram
June 4, 2008
Stat 111 - Lecture 6 - Random
Variables
23
Probability Histograms
• Probability histogram of sum of two dice:
• Using the disjoint addition rule, probabilities
for discrete random variables are calculated
by adding up the “bars” of this histogram:
P(sum > 10) = P(sum = 11) + P(sum = 12) = 3/36
June 4, 2008
Stat 111 - Lecture 6 - Random
Variables
24
Continuous Random Variables
• Continuous random variables have a noncountable number of values
• Can’t list the entire probability distribution, so
we use a density curve instead of a histogram
• Eg. Normal density curve:
June 4, 2008
Stat 111 - Lecture 6 - Random
Variables
25
Calculating Continuous Probabilities
• Discrete case: add up bars from probability histogram
• Continuous case: we have to use integration to
calculate the area under the density curve:
• Although it seems more complicated, it is often easier to
integrate than add up discrete “bars”
• If a discrete r.v. has many possible values, we often
treat that variable as continuous instead
June 4, 2008
Stat 111 - Lecture 6 - Random
Variables
26
Example: Normal Distribution
We will use the normal distribution throughout
this course for two reasons:
1.
2.
It is usually good approximation to real data
We have tables of calculated areas under the
normal curve, so we avoid doing integration!
June 4, 2008
Stat 111 - Lecture 6 - Random
Variables
27
Mean of a Random Variable
• Average of all possible values of a random
variable (often called expected value)
• Notation: don’t want to confuse random
variables with our collected data variables
 = mean of random variable
x = mean of a data variable
• For continuous r.v, we again need integration
to calculate the mean
• For discrete r.v., we can calculate the mean
by hand since we can list all probabilities
June 4, 2008
Stat 111 - Lecture 6 - Random
Variables
28
Mean of Discrete random variables
• Mean is the sum of all possible values, with
each value weighted by its probability:
μ = Σ xi*P(xi) = x1*P(x1) + … + x12*P(x12)
• Example: X = sum of two dice
X
2
3
4
5
6
7
8
9
10
11
12
P(X)
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36
μ = 2⋅ (1/36) + 3⋅ (2/36) + 4 ⋅ (3/36) +…+12⋅ (1/36)
= 252/36 = 7
June 4, 2008
Stat 111 - Lecture 6 - Random
Variables
29
Variance of a Random Variable
• Spread of all possible values of a random
variable around its mean
• Again, we don’t want to confuse random
variables with our collected data variables:
2 = variance of random variable
s2 = variance of a data variable
• For continuous r.v, again need integration to
calculate the variance
• For discrete r.v., can calculate the variance by
hand since we can list all probabilities
June 4, 2008
Stat 111 - Lecture 6 - Random
Variables
30
Variance of Discrete r.v.s
• Variance is the sum of the squared deviations
away from the mean of all possible values,
weighted by the values probability:
μ = Σ(xi-μ)*P(xi) = (x1-μ)*P(x1) + … + (x12-μ)*P(x12)
• Example: X = sum of two dice
X
2
3
4
5
6
7
8
9
10
11
12
P(X)
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36
σ2 = (2 - 7)2⋅(1/36) + (3− 7)2⋅(2/36) +…+(12 - 7)2⋅(1/36)
= 210/36 = 5.83
June 4, 2008
Stat 111 - Lecture 6 - Random
Variables
31
Next Class - Lecture 7
• Standardization and the Normal
Distribution
• Moore and McCabe: Section 4.3,1.3
June 4, 2008
Stat 111 - Lecture 6 - Random
Variables
32