Basic Probability and Statistics

Download Report

Transcript Basic Probability and Statistics

Basic Probability and Statistics

Random variables Distribution functions Various probability distributions

Definitions

• An experiment is a process whose output is not known with certainty.

• The set of all possible outcomes of an experiment is called the sample space (

S

).

• The outcomes are called sample points in

S

.

• A random variable is a function that assigns a real number to each point in

S

.

• A distribution function

F(x)

of the random variable

X

is defined for each real number

x

as follows

F

(

x

)  Pr(

X

x

) 2

Properties of distribution function

0 

F

(

x

)  1 

x

.

F(x)

is non decreasing .

For

x

1 

x

2 :

F

(

x

1 ) 

F

(

x

2 ).

x

lim  

F

(

x

)  1 ;

x

lim  

F

(

x

)  0 .

3

Random Variables

• • A random variable (r.v.)

X

is discrete if it can take on at most a countable number of values

x 1 , x 2 , x 3 ,

… • The probability that the discrete r.v.

X

takes on a value

x i

is given by:

p(x i )=Pr(X= x i ).

p(x)

is called the probability mass function .

i

   1

p

(

x i

)  1 .

F

(

x

) 

x i

 

x p

(

x i

) .

4

Random Variables

• • A r.v. is said to be continuous if there exists a nonnegative function

f(x)

such that for any set of real numbers

B

,

f(x)

is called probability density function .

Pr(

X

B

)  

B f

(

x

)

dx

and    

f

(

x

)

dx

 1 .

F

(

x

)  Pr(

X

x

)  Pr 

X

 (  ,

x

)  

x

  

f

(

y

)

dy

.

5

Random Variables

• Mean or expected value

µ,

and given by:   of a r.v.

X

        

j

   1 

xf x j p

(

x j

) if (

x

)

dx

if X is discrete, X is continuous .

is denoted by

E[X]

or • Variance of a r.v.

X

is denoted by

Var(X)

or

σ 2 ,

and given by:  2 

E

 

X

   2  

E

  2 .

6

Properties of mean

• If

X

is a discrete random variable having

pmf p(x)

, then:

E

g

(

x

)   

x g

(

x

)

p

(

x

) .

• If

X

is continuous with

pdf f(x)

, then:

E

g

(

x

)      

g

(

x

)

f

(

x

)

dx

.

• Hence, for constants

a

and

b

,

E

aX

b

 

aE

  

b

.

7

Property of variance

• For constants

a

and

b

,

Var

aX

b

 

a

2

Var

  .

8

Joint Distribution

• If

X

and

Y

are discrete r.v., then,

p

(

x

,

y

)  Pr(

X

x

,

Y

y

) 

x

,

y

• • is called the joint probability mass function of

X

and

Y

.

Marginal probability mass functions of X and Y:

p X p Y

(

x

) (

y

)   

Y

X p

(

x

,

p

(

x

,

y

)

y

) X, Y are independent if

p

(

x

,

y

) 

p X

(

x

)

p Y

(

y

) 9

Conditional probability

• • Let

A

and

B

be two events.

Pr(A|B)

is the conditional probability of event

A

happening given that

B

has already occurred.

• Baye’s theorem: Pr 

A

|

B

  Pr 

A

Pr   

B

 .

• If events

A

and

B

are independent, then

Pr(A|B) = Pr(A)

.

• Hence, from Baye’s theorem: Pr

A

B

 Pr

 

 Pr

   

.

10

Dependency

• Covariance is a measure of linear dependence and is denoted by

C ij C ij

E

 

X i

 

i

or

Cov(X i , X j )

X j

 

j

 

X i X j

  

i

j

,

i

 1 ..

n

&

j

 1 ..

n

• Another measure of linear dependency is the correlation factor : 

ij

C ij

i

2 

j

2 ,

i

 1 ..

n

&

j

 1 ..

n

.

• Correlation factor is dimensionless but covariance is not.

11

Two random numbers in simulation experiment

• Let

X

and

Y

be two random variates in a given simulation experiment that are

not

independent.

• Our performance parameter is

X+Y

.

E

Var X

 

X Y

    

Y

  

X Var

E

 

Y

 .

Var

 2

Cov

X

,

Y

.

 • However, if the two r.v.’s

are Cov Var

 

X X

,

Y

Y

 

  0 .

Var

Var

independent:

 

.

12

Bernoulli trial

• An experiment with only two outcomes – “ Success ” and “ Failure ” where the chance of outcome is known

apriori

. • Denoted by the chance of success “

p

” (this is a parameter for the distribution). • Example: Tossing a “fair” coin. • Let us define a variable

X i

such that –

X i

   1 0 if trial

i

is otherwise.

a success • Then,

E[X i ] = p

; and

Var(X i ) = p(1-p)

.

13

Binomial r.v.

• A series of

n

independent Bernoulli trials . • If

X

is the number of successes that occur in the n trials, then

X

is said to be Binomial r.v. with parameters

(n, p)

. Its probability mass function is:

P x

 Pr 

X

x

   

n x

 

p x

( 1 

p

)

n

x

,

x

 0 , 1 , 2 ,...

n where

 

n x

  

n

!

x

!

(

n

x

)!

.

14

Binomial r.v.

X X i

i n

  1

X i

,    1 0 if trial

i

is otherwise.

a success

E

[

X

] 

i n

  1

E

 

i

np

,

Var

(

X

)  

i n

  1

Var

 

i np

( 1 

p

).

15

Poisson r.v.

• A r.v.

X

which can take values 0, 1, 2, … is said to have a Poisson distribution with parameter

λ (λ > 0)

if the

pmf

is given by:

p i

 Pr 

X

i

 

e

  

i i

!

,

i

 0 , 1 , 2 ,...

• For a Poisson r.v.,

E

  

Var

    .

• The probabilities can be recursively found out:

p i

 1 

i

  1

p i

,

i

 0 .

16

Uniform r.v.

• A r.v.

X

is said to be uniformly distributed over the interval

(a, b)

when its pmf is:

f

(

x

)   

b

1 

a

0

if a

x otherwise

.

b

• Expected value:

E E

 1

b

a b a

xdx

b

2 

a

2 2 (

b

a

) 

a

b

.

2 

b

1 

a b a

x

2

dx

b

3  3 (

b

a

3

a

) 

a

2 

b

2 

ab

.

3 17

Uniform r.v.

• Variance

Var

E

 

E

   2  (

b

a

) 2 .

12 • Distribution function

F(x)

for a given

x: a < x < b

is

F

(

x

)  Pr 

X

x

 

x a

b

1 

a dy

x

a

.

b

a

18

Normal r.v.

pdf:

f

(

x

)  1 2  

e

 (

x

  ) 2 / 2  2 ,  

x

  .

The normal density is a bell-shaped curve about

µ

.

that is symmetric It can be shown that for a normal r.v.

X

with parameters

(µ, σ 2 )

,

E

    ,

Var

  2 .

19

Normal r.v.

• If

X

~

N(µ, σ 2 ) Z

X

N(0,1)

.

• Probability distribution function of “Standard Normal” is given as:  (

x

)  1 2   

x

e

y

2 / 2

dy

,   

x

  .

• If

X

~

N(µ, σ 2 )

, then:

F

(

x

)   

x

    .

20

Central Limit Theorem

• Let

X 1 , X 2 , X 3 …X n

be a sequence of IID random variables having a finite mean

µ

and finite variance

σ 2

. Then: lim

n

  Pr

X

1 

X

2   

n X n

n

 

x

  (

x

).

21

Exponential r.v.

pdf:

f

(

x

)  

e

 

x

, 0 

x

  .

cdf:

F

(

x

) 

x

0 

f

(

y

)

dy

x

0  

e

 

y dy

 1 

e

 

x

.

E

 1  ;

Var

 1  2 .

22

Exponential r.v.

• When multiplied by a constant, it still remains an exponential r.v.

Pr 

cX

x

  Pr 

X x c

 1 

e

 

x

/

c

.

cX

~

Expo

c

  .

• Most useful property: Memoryless!!!

Pr 

X

s

t

|

X

t

  Pr 

X

s

 

t

,

s

 0 .

• Analytical simplicity

X

1

P

( ~

X

1

Exp

(  1 ),

X

X

2 )   1 ~ 2   1  2

Exp

(  2 .

) 23

Poisson process

A counting process {

N

(

t

),

t

 0 } is said to be a Poisson process if : 

N

( 0 )  0 .

 The process has independen t increments .

 The number of events in any interval of length distribute d with mean  .

That is, 

s

,

t

 0

t

is Poisson Pr 

N

(

t

s

) 

N

(

s

) 

n

 

e

 

t

( 

t

)

n

,

n

 0 , 1 , 2 

n

!

If

T n

,

n

 1 is the time between (

n

 1 )

st

and

n

th event, then this interarriv al time has exponentia l distributi on.

24

Useful property of Poisson process

• Let

S 1 1

denote the time of the first event of the first Poisson process (with rate

λ 1

), and

S 1 2

denote the time of the first event of the second Poisson process (with rate

λ 2

). Then:

P

(

S

1 1 

S

1 2 )   1   1  2 25

Covariance stationary processes

• Covariance between two observations

X i

only on

j

and not on

i

. and

X i+j

depends • Let

C j

be the covariance for this process. • So the correlation factor is given by: 

j

C i

,

i

j

i

2 

i

2 

j

C j

 2 ,

j

 1 , 2 ,  .

26

Point Estimation

• Let

X 1 , X 2 , X 3 …X n

be a sequence of IID random variables (observations) having a finite population mean

µ

and finite population variance

σ 2

.

• We are interested in finding these population parameters through the sample values.

n X i

  1

X i n

n

• This sample mean is unbiased point estimator • That is to say that:

E

 

n

  .

of

µ

.

27

Point Estimation

• The sample variance:

S

2 (

n

) 

i n

  1 

X i n

  1

X n

is an unbiased point estimator of

σ 2

.

 2 • Variance of the mean :

Var

 

n

  2

n

.

• We can estimate this variance of mean by:

Var

 

n

S

2 (

n

) .

n

• This is true only if

X 1 , X 2 , X 3 …X n

are IID.

28

Point Estimation

• However, most often in simulation experiment, the data is

correlated

. • In that case, estimation using sample variance is dangerous. Because it underestimates the actual population variance .

E

S

2 (

n

)    2 , and

E

 

S

2

n

(

n

)   

Var

 

n

.

29

Interval Estimation

• Let

X 1 , X 2 , X 3 …X n

be a sequence of IID random variables (observations) having a finite population mean

µ

and finite population variance

σ 2

(

> 0

).

• We want to construct confidence interval for mean

µ

.

• Let

Z n F n (z)

. be a random variable with a probability distribution

Z n

 

X n

 2  / 

n

 .

F n

(

z

)  Pr 

Z n

z

 .

30

Interval Estimation

• Central Limit Theorem states that:

F n

(

z

)   (

z

)

as n

  .

where is the standard normal distribution with mean 0 and variance 1. • Often, we don’t know the population variance

σ 2

.

• It can be shown that CLT applies if we replace

σ 2

variance

S 2 (n)

.

by sample

t n

 

X n

  

S

2 (

n

) /

n

• The variable

t n

is approximately normal as

n

increases. 31

Standard Normal distribution

• Standard Normal distribution is

N(0,1)

.

• The cumulative distributive function (CDF) at any given value (

z

) can be found using standard statistical tables. • Conversely, if we know the probability, we can compute the corresponding value of

z

such that,

F

(

z

1 )  Pr 

Z

z

1   1   2 .

• This value is

z 1-α/2

and is called the critical point • Similarly, the other critical point (

z 2 F

(

z

2 )  Pr 

Z

z

2    2 .

=

for

N(0,1)

.

-

z 1-α/2

) is such that: 32

Interval Estimation

• It follows for a large

n

: Pr  

z

1   2

Z n z

1    2  Pr   

z

1   2 

X n

 

S

2 (

n

) /

n

z

1   2  Pr

X n

z

1   2  1   .

S

2 (

n

)

n

  

X n

z

1   2

S

2 (

n

)

n

33

Interval Estimation

• Therefore, if

n 100(1-α)

is sufficiently large, an approximate percent confidence interval of

µ

is given by:

X n

z

1   2

S

2 (

n

) .

n

• If we construct a large number of independent

100(1-α)

percent confidence intervals each based on

n

different observations (

n

sufficiently large), the proportion of these confidence intervals that contain

µ

should be

1-α

.

34

Interval Estimation

• What if the

n

is not “sufficiently large”?

• If

X i

’s are normal random variables, the random variable

t n

a

t

-distribution with

n-1

degrees of freedom . has • In this case, the

100(1-α)

percent confidence interval for

µ

is given by:

X n

t n

 1 , 1   2

S

2 (

n

) .

n

35

Interval Estimation

• In practice, the distribution of

X i

’s is rarely normal and the confidence interval (with

t

-distribution) will be

approximate

.  1 , 1   2 

z

1   2

t

” is larger than the one with “

z

”.

• Hence, it is recommended that we use the CI with “

t

” . Why? • However,

t n

 1 , 1   2 

z

1   2 as

n

  .

36

Hypotheses testing

• Assume that

X 1 , X 2 , X 3 …X n

are normally distributed (or be approximately normal) and that we would like to test whether

µ = µ 0

, where

µ 0

is a fixed hypothesized value of

µ

.

• If is large then our hypothesis is not true.

n

0 • To conduct such test (whether the hypothesis is true or not), we need a statistical parameter whose distribution is known when the hypothesis is true. • Turns out, if our hypothesis is true (

µ = µ 0

), then the statistic

t n

has a

t

-distribution with

n-1

df. 37

Hypotheses testing

• We form our two-tailed hypothesis (

H 0

) to test for

µ = µ 0 If t n

   

t n

 1 , 1   2 Reject

H

0

t n

 1 , 1   2 ``Accept' '

H

0 as: • The portion of real line that corresponds to the rejection of

H 0

is called the critical region for the test.

• The probability that the statistic

t n

given that

H 0

level of the test. falls in the critical region is true, which is clearly equal to

α

, is called • Typically if the

t n

doesn’t fall in the rejection region, we “do not reject” the

H 0

. 38

Hypotheses testing

• Type I error : If one rejects

H 0

when it is true, this is called Type I error, which is again equal to

α

. This errors is under experimenter's control.

• Type II error : If one accepts

H 0

error. It is denoted by

β

.

when it is false, it is Type II • We call

δ = 1- β

as power of test which is the probability of rejecting

H 0

when it is false. • For a fixed

α

, power of the test can only be increased by increasing

n

. 39