Transcript Basic Probability and Statistics
Basic Probability and Statistics
Random variables Distribution functions Various probability distributions
Definitions
• An experiment is a process whose output is not known with certainty.
• The set of all possible outcomes of an experiment is called the sample space (
S
).
• The outcomes are called sample points in
S
.
• A random variable is a function that assigns a real number to each point in
S
.
• A distribution function
F(x)
of the random variable
X
is defined for each real number
x
as follows
F
(
x
) Pr(
X
x
) 2
Properties of distribution function
0
F
(
x
) 1
x
.
F(x)
is non decreasing .
For
x
1
x
2 :
F
(
x
1 )
F
(
x
2 ).
x
lim
F
(
x
) 1 ;
x
lim
F
(
x
) 0 .
3
Random Variables
• • A random variable (r.v.)
X
is discrete if it can take on at most a countable number of values
x 1 , x 2 , x 3 ,
… • The probability that the discrete r.v.
X
takes on a value
x i
is given by:
p(x i )=Pr(X= x i ).
p(x)
is called the probability mass function .
i
1
p
(
x i
) 1 .
F
(
x
)
x i
x p
(
x i
) .
4
Random Variables
• • A r.v. is said to be continuous if there exists a nonnegative function
f(x)
such that for any set of real numbers
B
,
f(x)
is called probability density function .
Pr(
X
B
)
B f
(
x
)
dx
and
f
(
x
)
dx
1 .
F
(
x
) Pr(
X
x
) Pr
X
( ,
x
)
x
f
(
y
)
dy
.
5
Random Variables
• Mean or expected value
µ,
and given by: of a r.v.
X
j
1
xf x j p
(
x j
) if (
x
)
dx
if X is discrete, X is continuous .
is denoted by
E[X]
or • Variance of a r.v.
X
is denoted by
Var(X)
or
σ 2 ,
and given by: 2
E
X
2
E
2 .
6
Properties of mean
• If
X
is a discrete random variable having
pmf p(x)
, then:
E
g
(
x
)
x g
(
x
)
p
(
x
) .
• If
X
is continuous with
pdf f(x)
, then:
E
g
(
x
)
g
(
x
)
f
(
x
)
dx
.
• Hence, for constants
a
and
b
,
E
aX
b
aE
b
.
7
Property of variance
• For constants
a
and
b
,
Var
aX
b
a
2
Var
.
8
Joint Distribution
• If
X
and
Y
are discrete r.v., then,
p
(
x
,
y
) Pr(
X
x
,
Y
y
)
x
,
y
• • is called the joint probability mass function of
X
and
Y
.
Marginal probability mass functions of X and Y:
p X p Y
(
x
) (
y
)
Y
X p
(
x
,
p
(
x
,
y
)
y
) X, Y are independent if
p
(
x
,
y
)
p X
(
x
)
p Y
(
y
) 9
Conditional probability
• • Let
A
and
B
be two events.
Pr(A|B)
is the conditional probability of event
A
happening given that
B
has already occurred.
• Baye’s theorem: Pr
A
|
B
Pr
A
Pr
B
.
• If events
A
and
B
are independent, then
Pr(A|B) = Pr(A)
.
• Hence, from Baye’s theorem: Pr
A
B
Pr
Pr
.
10
Dependency
• Covariance is a measure of linear dependence and is denoted by
C ij C ij
E
X i
i
or
Cov(X i , X j )
X j
j
X i X j
i
j
,
i
1 ..
n
&
j
1 ..
n
• Another measure of linear dependency is the correlation factor :
ij
C ij
i
2
j
2 ,
i
1 ..
n
&
j
1 ..
n
.
• Correlation factor is dimensionless but covariance is not.
11
Two random numbers in simulation experiment
• Let
X
and
Y
be two random variates in a given simulation experiment that are
not
independent.
• Our performance parameter is
X+Y
.
E
Var X
X Y
Y
X Var
E
Y
.
Var
2
Cov
X
,
Y
.
• However, if the two r.v.’s
are Cov Var
X X
,
Y
Y
0 .
Var
Var
independent:
.
12
Bernoulli trial
• An experiment with only two outcomes – “ Success ” and “ Failure ” where the chance of outcome is known
apriori
. • Denoted by the chance of success “
p
” (this is a parameter for the distribution). • Example: Tossing a “fair” coin. • Let us define a variable
X i
such that –
X i
1 0 if trial
i
is otherwise.
a success • Then,
E[X i ] = p
; and
Var(X i ) = p(1-p)
.
13
Binomial r.v.
• A series of
n
independent Bernoulli trials . • If
X
is the number of successes that occur in the n trials, then
X
is said to be Binomial r.v. with parameters
(n, p)
. Its probability mass function is:
P x
Pr
X
x
n x
p x
( 1
p
)
n
x
,
x
0 , 1 , 2 ,...
n where
n x
n
!
x
!
(
n
x
)!
.
14
Binomial r.v.
X X i
i n
1
X i
, 1 0 if trial
i
is otherwise.
a success
E
[
X
]
i n
1
E
i
np
,
Var
(
X
)
i n
1
Var
i np
( 1
p
).
15
Poisson r.v.
• A r.v.
X
which can take values 0, 1, 2, … is said to have a Poisson distribution with parameter
λ (λ > 0)
if the
pmf
is given by:
p i
Pr
X
i
e
i i
!
,
i
0 , 1 , 2 ,...
• For a Poisson r.v.,
E
Var
.
• The probabilities can be recursively found out:
p i
1
i
1
p i
,
i
0 .
16
Uniform r.v.
• A r.v.
X
is said to be uniformly distributed over the interval
(a, b)
when its pmf is:
f
(
x
)
b
1
a
0
if a
x otherwise
.
b
• Expected value:
E E
1
b
a b a
xdx
b
2
a
2 2 (
b
a
)
a
b
.
2
b
1
a b a
x
2
dx
b
3 3 (
b
a
3
a
)
a
2
b
2
ab
.
3 17
Uniform r.v.
• Variance
Var
E
E
2 (
b
a
) 2 .
12 • Distribution function
F(x)
for a given
x: a < x < b
is
F
(
x
) Pr
X
x
x a
b
1
a dy
x
a
.
b
a
18
Normal r.v.
pdf:
f
(
x
) 1 2
e
(
x
) 2 / 2 2 ,
x
.
The normal density is a bell-shaped curve about
µ
.
that is symmetric It can be shown that for a normal r.v.
X
with parameters
(µ, σ 2 )
,
E
,
Var
2 .
19
Normal r.v.
• If
X
~
N(µ, σ 2 ) Z
X
N(0,1)
.
• Probability distribution function of “Standard Normal” is given as: (
x
) 1 2
x
e
y
2 / 2
dy
,
x
.
• If
X
~
N(µ, σ 2 )
, then:
F
(
x
)
x
.
20
Central Limit Theorem
• Let
X 1 , X 2 , X 3 …X n
be a sequence of IID random variables having a finite mean
µ
and finite variance
σ 2
. Then: lim
n
Pr
X
1
X
2
n X n
n
x
(
x
).
21
Exponential r.v.
pdf:
f
(
x
)
e
x
, 0
x
.
cdf:
F
(
x
)
x
0
f
(
y
)
dy
x
0
e
y dy
1
e
x
.
E
1 ;
Var
1 2 .
22
Exponential r.v.
• When multiplied by a constant, it still remains an exponential r.v.
Pr
cX
x
Pr
X x c
1
e
x
/
c
.
cX
~
Expo
c
.
• Most useful property: Memoryless!!!
Pr
X
s
t
|
X
t
Pr
X
s
t
,
s
0 .
• Analytical simplicity
X
1
P
( ~
X
1
Exp
( 1 ),
X
X
2 ) 1 ~ 2 1 2
Exp
( 2 .
) 23
Poisson process
A counting process {
N
(
t
),
t
0 } is said to be a Poisson process if :
N
( 0 ) 0 .
The process has independen t increments .
The number of events in any interval of length distribute d with mean .
That is,
s
,
t
0
t
is Poisson Pr
N
(
t
s
)
N
(
s
)
n
e
t
(
t
)
n
,
n
0 , 1 , 2
n
!
If
T n
,
n
1 is the time between (
n
1 )
st
and
n
th event, then this interarriv al time has exponentia l distributi on.
24
Useful property of Poisson process
• Let
S 1 1
denote the time of the first event of the first Poisson process (with rate
λ 1
), and
S 1 2
denote the time of the first event of the second Poisson process (with rate
λ 2
). Then:
P
(
S
1 1
S
1 2 ) 1 1 2 25
Covariance stationary processes
• Covariance between two observations
X i
only on
j
and not on
i
. and
X i+j
depends • Let
C j
be the covariance for this process. • So the correlation factor is given by:
j
C i
,
i
j
i
2
i
2
j
C j
2 ,
j
1 , 2 , .
26
Point Estimation
• Let
X 1 , X 2 , X 3 …X n
be a sequence of IID random variables (observations) having a finite population mean
µ
and finite population variance
σ 2
.
• We are interested in finding these population parameters through the sample values.
n X i
1
X i n
n
• This sample mean is unbiased point estimator • That is to say that:
E
n
.
of
µ
.
27
Point Estimation
• The sample variance:
S
2 (
n
)
i n
1
X i n
1
X n
is an unbiased point estimator of
σ 2
.
2 • Variance of the mean :
Var
n
2
n
.
• We can estimate this variance of mean by:
Var
n
S
2 (
n
) .
n
• This is true only if
X 1 , X 2 , X 3 …X n
are IID.
28
Point Estimation
• However, most often in simulation experiment, the data is
correlated
. • In that case, estimation using sample variance is dangerous. Because it underestimates the actual population variance .
E
S
2 (
n
) 2 , and
E
S
2
n
(
n
)
Var
n
.
29
Interval Estimation
• Let
X 1 , X 2 , X 3 …X n
be a sequence of IID random variables (observations) having a finite population mean
µ
and finite population variance
σ 2
(
> 0
).
• We want to construct confidence interval for mean
µ
.
• Let
Z n F n (z)
. be a random variable with a probability distribution
Z n
X n
2 /
n
.
F n
(
z
) Pr
Z n
z
.
30
Interval Estimation
• Central Limit Theorem states that:
F n
(
z
) (
z
)
as n
.
where is the standard normal distribution with mean 0 and variance 1. • Often, we don’t know the population variance
σ 2
.
• It can be shown that CLT applies if we replace
σ 2
variance
S 2 (n)
.
by sample
t n
X n
S
2 (
n
) /
n
• The variable
t n
is approximately normal as
n
increases. 31
Standard Normal distribution
• Standard Normal distribution is
N(0,1)
.
• The cumulative distributive function (CDF) at any given value (
z
) can be found using standard statistical tables. • Conversely, if we know the probability, we can compute the corresponding value of
z
such that,
F
(
z
1 ) Pr
Z
z
1 1 2 .
• This value is
z 1-α/2
and is called the critical point • Similarly, the other critical point (
z 2 F
(
z
2 ) Pr
Z
z
2 2 .
=
for
N(0,1)
.
-
z 1-α/2
) is such that: 32
Interval Estimation
• It follows for a large
n
: Pr
z
1 2
Z n z
1 2 Pr
z
1 2
X n
S
2 (
n
) /
n
z
1 2 Pr
X n
z
1 2 1 .
S
2 (
n
)
n
X n
z
1 2
S
2 (
n
)
n
33
Interval Estimation
• Therefore, if
n 100(1-α)
is sufficiently large, an approximate percent confidence interval of
µ
is given by:
X n
z
1 2
S
2 (
n
) .
n
• If we construct a large number of independent
100(1-α)
percent confidence intervals each based on
n
different observations (
n
sufficiently large), the proportion of these confidence intervals that contain
µ
should be
1-α
.
34
Interval Estimation
• What if the
n
is not “sufficiently large”?
• If
X i
’s are normal random variables, the random variable
t n
a
t
-distribution with
n-1
degrees of freedom . has • In this case, the
100(1-α)
percent confidence interval for
µ
is given by:
X n
t n
1 , 1 2
S
2 (
n
) .
n
35
Interval Estimation
• In practice, the distribution of
X i
’s is rarely normal and the confidence interval (with
t
-distribution) will be
approximate
. 1 , 1 2
z
1 2
t
” is larger than the one with “
z
”.
• Hence, it is recommended that we use the CI with “
t
” . Why? • However,
t n
1 , 1 2
z
1 2 as
n
.
36
Hypotheses testing
• Assume that
X 1 , X 2 , X 3 …X n
are normally distributed (or be approximately normal) and that we would like to test whether
µ = µ 0
, where
µ 0
is a fixed hypothesized value of
µ
.
• If is large then our hypothesis is not true.
n
0 • To conduct such test (whether the hypothesis is true or not), we need a statistical parameter whose distribution is known when the hypothesis is true. • Turns out, if our hypothesis is true (
µ = µ 0
), then the statistic
t n
has a
t
-distribution with
n-1
df. 37
Hypotheses testing
• We form our two-tailed hypothesis (
H 0
) to test for
µ = µ 0 If t n
t n
1 , 1 2 Reject
H
0
t n
1 , 1 2 ``Accept' '
H
0 as: • The portion of real line that corresponds to the rejection of
H 0
is called the critical region for the test.
• The probability that the statistic
t n
given that
H 0
level of the test. falls in the critical region is true, which is clearly equal to
α
, is called • Typically if the
t n
doesn’t fall in the rejection region, we “do not reject” the
H 0
. 38
Hypotheses testing
• Type I error : If one rejects
H 0
when it is true, this is called Type I error, which is again equal to
α
. This errors is under experimenter's control.
• Type II error : If one accepts
H 0
error. It is denoted by
β
.
when it is false, it is Type II • We call
δ = 1- β
as power of test which is the probability of rejecting
H 0
when it is false. • For a fixed
α
, power of the test can only be increased by increasing
n
. 39