Chapter 2 in Undergraduate Econometrics

Download Report

Transcript Chapter 2 in Undergraduate Econometrics

Chapter 2: Probability

Random Variable

(r.v.) is a variable whose value is unknown until it is observed. The value of a random variable results from an experiment.

Experiments can be either controlled (laboratory) or uncontrolled (observational). Most economic variables are random and are the result of uncontrolled experiments.

2.1

2.2

Random Variables A

discrete random variable

can take on only a finite number of values such as • The number of visits to a doctor’s office • Number of children in a household • Flip of a coin • Dummy (binary) variable: D=0 if male, D=1 if female A

continuous random variable

can take any real value (not just whole numbers) in an interval on the real number line such as: • Gross Domestic Product next year • Price of a share in Microsoft • Interest rate on a 30 year mortgage

Probability Distributions of Random Variables • All random variables have probability distributions that describe the values the random variable can take on and the associated probabilities of these values.

• Knowing the probability distribution of random variable gives us some indication of the value the r.v. may take on. 2.3

Probability Distribution for Discrete Random Variable 2.4

Expressed as a table, graph or function 1. Suppose X = # of tails when a coin is flipped twice. X can take on the values 0, 1 or 2. Let f(x) be the associated probabilities: Table Graph X f(x) 0 0.25

1 0.50

2 0.25

f(x) 0.50

0.25

Probability is represented as height on this bar graph 0 1 2 x

2. Suppose X is a binary variable that can take on two values: 0 or 1. Furthermore, assume P(X=1) = p and P(X=0) = (1-p) Function: P(X=x) = f(x) = p x (1-p) 1-x for X = 0, 1 2.5

Table X f(x) 0 (1-p) 1 p Suppose p = 0.10

Then X takes on 0 with probability 0.90 and X takes on 1 with probability 0.10

Facts about discrete probability distribution functions 2.6

1.

Each probability P(X=x) = f(x) must lie between 0 and 1: 0  f(x)  1 2 . The sum of the probabilities must be 1. If X can take on n different values then: f(x 1 ) + f(x 2 )+. . .+f(x n ) = 1

2.7

Probability Distribution (Density)for Continuous Random Variables Expressed as a function or graph. Continuous r.v.’s can take on an infinite number of values in a given interval – A table isn’t appropriate to express pdf EX: f(x) = 2x for 0  x  1 = 0 otherwise

2.8

Because a continuous random variable has an uncountably infinite number of values, the probability of one occurring is zero. P(X = a) = 0 Instead, we ask “What is the probability that X is between

a

and

b

? P[

a

< X <

b

] = ? In an experiment, the probability P[

a

< X <

b

] is the proportion of the time, in

many

experiments, that X will fall between

a

and

b

.

2.9

Probability is represented as area under the function. Total area must f(x) be 1.0

Area of triangle is 1.0

2 1 Probability that x lies between 0 and 1/2 P [ 0  X  1/2 ] = 0.25

[Area of any triangle is ½*Base*Height] 1/2 1 x

• •

Uniform Random Variable:

u is distributed uniformly between a and b p.d.f. is a line between a and b of height 1/(b-a) f(u) = 1/(b – a) if a  u  b = 0 otherwise EX: Spin a dial on a clock a = 0 and b = 12 Find the probability that u lies between 1 and 2 f(u) 1/12 0 1 2 12 u 2.10

In calculus, the integral of a function defines the area under it:

P [ a

X

b ] =

a  b

f(x) dx

For continuous random variables it is the area under f(x), and not f(x) itself, which defines the probability of an event. We will NOT be integrating functions; when necessary we use tables and/or computers to calculate the necessary probability (integral).

2.11

Rules of Summation

n

Rule 1: 

i = 1

x i = x 1 + x 2 + . . . + x n Rule 2:

n

i = 1

a = na Rule 3:  ax i = a  x i

n

Rule 4:   x i

i = 1

+ y i  =

n

i = 1

x i +

n

i = 1

y i 2.12

Rules of Summation (continued)

n

Rule 5:   ax i

i = 1

+ by i 

n

= a 

i = 1

x i + b

n

i = 1

y i Rule 6: x =

1 n n

i = 1

x i = x 1 + x 2

n

+ . . . + x n From Rule 6, we can prove (in class) that:

n

  x i

i = 1

 x) = 0 2.13

2.14

Rules of Summation (continued)

n

Rule 6: 

i = 1

f(x i ) = f(x 1 ) + f(x 2 ) + . . . + f(x n ) Notation: 

x

f(x i ) = 

i

f(x i ) =

n

i = 1

f(x i )

n m

Rule 7:  

i = 1 j = 1

f(x i ,y j ) =

n



i = 1

[ f(x i ,y 1 ) + f(x i ,y 2 )+. . .+ f(x i ,y m )] The order of summation does not matter

: n m

 

i = 1 j = 1

f(x i ,y j ) =

m n

 

j = 1 i = 1

f(x i ,y j )

The Mean of a Random Variable

The mean of a random variable is its mathematical expectation, or

expected value.

For a discrete random variable, this is: where

n

E(X) =  x i f(x i ) = x 1 f(x 1 ) + x 2 f(x 2 ) + . . . + x n f(x n ) measures the number of values X can take on 2.15

It is a probability-weighted average of the possible values the random variable X can take on. This is a sum for discrete r.v.’s and an integral for continuous r.v.’s

• E(X) tells us the “long-run” average value for X. It is not the value one would expect X to take on. • If you were to randomly draw values of X from its pdf an infinite number of times and average these values, you would get E(X) • E(X) =  this greek letter “mu” is not used in your text but is commonly used to denote the mean of X.

2.16

Example: Roll a fair die

E

i

6   1

x i f

 1 ( 1 / 6 )   

i

2 ( 1 / 6 )  3 ( 1 / 6 )  4 ( 1 / 6 )  5 ( 1 / 6 )  6 ( 1 / 6 )  21 / 6  3 .

5 Interpretation: In a large number of rolls of a fair die, one sixth of the values will be 1’s, one-sixth of the values will be 2’s. etc., and the average of these values will be 3.5. 2.17

Mathematical Expectation • • Think of E(.) as an operator that requires you to weight by probabilities any expression inside the parentheses, and then sum E(g(x)) =  g(x i )f(x i ) = g(x 1 )f(x 1 ) + g(x 2 ) f(x 2 ) + . . . + g(x n ) f(x n ) 2.18

Rules of Mathematical Expectation • E(c) = c where c is a constant • E(cX) = cE(X) where c is a constant and X is a random variable • E(a + cX) = a + cE(X) where a and c are constants and X is a random variable.

2.19

Variance of a Random Variable • • • • Like the mean, the variance of a r.v. is an expected value, but it is the

expected value of the squared deviations from the mean

Let g(x) = (x – E(x)) 2 Variance  2 = Var(x) = E(x – E(x)) =  g(x i )f(x i ) =  (x i – E(x i )) 2 f(x i ) 2 It measures the amount of dispersion in the possible values for X.

2.20

2.21

About Variance • • Unit of measurement is X units squared When we create a new random variable as a linear transformation of X: y = a + cx We know that E(y) = a + cE(x) But Var(y) = c 2 Var(x) (proof in class) This property tells us that the amount of variation in y is determined by: the amount of variation in X and the constant

c

. The additive constant

a

in no way alters the amount of variation in the values on x.

About Variance (con’t) • E(x – E(x)) 2 = E[x 2 – 2E(x)x + E(x) 2 ] = E(x 2 ) – 2E(x)E(x) + E(x) 2 = E(x 2 ) – 2E(x) 2 + E(x) 2 = E(x 2 ) – E(x) 2 • Run the E(.) operator thru, pulling out constants and stopping on random variables. Remember that E(x) is itself a constant, so • E(E(x)) = E(x) 2.22

2.23

Standard Deviation • Because variance is in squared units of the r.v., we can take the square root of the variance to obtain the standard deviation.

 =   2 =  Var(x) Be sure to take the square root

after

you square and sum the deviations from the mean.

2.24

Joint Probability • An experiment can randomly determine the outcome of more than one variable. • When there are 2 random variables of interest, we study the

joint probability density function

• When there are more than 2 random variables of interest, we study the

multivariate probability density function

.

Y

For a discrete joint pdf, probability is expressed in a matrix: Let X= return on stocks, Y= return on bonds

6 8 10 X -10

0 0 0.10

0

0 0.10

0.10

10

0.10

0.30

0

20

0.10

0.20

0

f(y) f(x)

P(X=x,Y=y) = f(x,y) e.g. P(X=10,Y=8) = 0.30

2.25

2.26

About Joint P.d.F’s • Marginal Probability Distribution: what is the probability distribution for X regardless of what values Y takes on?

f(x) =  y f(x,y) what is the probability distribution for Y regardless of what values X takes on?

f(y) =  x f(x,y)

• 2.27

Conditional Probability Distribution: What is the probability distribution for X given that Y takes on a particular value?

f(x|y) = f(x,y)/f(y) What is the probability distribution for Y given that X takes on a particular value?

f(y|y\x) = f(x,y)/f(x)

2.28

• Covariance: A measure that summarizes the joint probability distribution between two random variables.

cov(x,y) = E[(x – E(x))(y-E(y))] =

 x  y

(x

i

– E(x))(y

i

– E(y))f(x,y)

Ex:

2.29

About Covariance: It measures the joint association between 2 random variables. Try asking: “When X is large, is Y more or less likely to also be large?” If the answer is that Y is likely to be large when X is large, then we say X and Y have a positive relationship. Cov(x,y) > 0 If the answer is that Y is likely to be small when X is large, then we say that X and Y have a negative relationship. Cov(x,y) < 0.

cov(x,y) = E[(x – E(x))(y – E(y))] = E[xy – E(x)y – xE(y) + E(x)E(y)] = E(xy) – E(x)E(y) – E(x)E(y) + E(x)E(y) = E(xy) – E(x)E(y)  useful!!

• Correlation Covariance has awkward units of measurement.

Correlation

removes all units of measurement by dividing covariance by the product of the standard deviations:  xy = Cov(x,y)/(  x  y ) and –1   xy  1 Ex: 2.30

What does correlation look like??

2.31

=0

=.3

 

=.7

=.9

Statistical Independence Two random variables are statistically independent if knowing the value that one will take on does not reveal anything about what value the other may take on: f(x|y) = f(x) or f(y|x) = f(y) This implies that f(x,y) = f(x)f(y) if X and Y are independent. If 2 r.v.’s are independent, then their covariance will necessarily be equal to 0.

2.32

2.33

Functions of more than one Random Variable Suppose that X and Y are two random variables. If we sum them together we create a new random variable that has the following mean and variance: Z = aX + bY  E(Z) = E(aX + bY) = aE(x) + bE(y) Var(Z) = Var(aX + bY) = a 2 Var(X) + b 2 Var(Y) + 2abCov(X,Y) If X and Y are independent  Var(Z) = Var(aX + bY) = a 2 Var(X) + b 2 Var(Y)

see page 31

Normal Probability Distribution • Many random variables tend to have a normal distribution (a well known bell shape) • Theoretically, x~N(β,  2 ) where E(x) = β and Var(x) =  2 The probability density function is  1 2  2 exp  

x

2  2 ) 2   , 2.34

a

b x

• Normal Distribution (con’t) A family of distributions, each with its own mean and variance. The mean anchors the distribution’s center and the variance captures the spread of the bell-shaped curve 2.35

• To find area under the curve would require integrating the p.d.f – too complicated. Computer generated table gives all the probabilities we need for a normal r.v. that has mean 0 and variance of 1 To use the table (pg. 389), we need to take a normal random variable x~N(  ,  2 ) and transform it by subtracting the mean and dividing by the standard deviation. This is a linear transformation of X that creates a new random variable that has mean 0 and variance of 1.

Z = (x  )/  where z ~N(0,1)

Statistical inference: drawing conclusions about a population based on a sample 2.36

E

(

X

)    

x i f

(

x i

)

Var

(

X

)   2 

E

(

X

E

(

X

)) 2 

E

(

X

  ) 2 

x

 

x

2 

Var

(

X

)

Cov

(

X

,

Y

)  

E

(

X

 

x

)(

Y E

(

XY

)  

x

y

 

y

)

X s x

2 

t T

  1

X t

T

 (

x i T

  1

x

) 2

s x

s

2

x S xy

T

1  1  (

x t

x

)(

y t

y

) 

xy

Cov

(

X

,

Y

)

Var

(

X

)

Var

(

Y

)

r

s S xy

2

x s

2

y

  ( 

x t

(

x t

x

 ) 2

x

)(

y t

  (

y t y

) 

y

) 2