Chapter 10: Covariance and Correlation

Download Report

Transcript Chapter 10: Covariance and Correlation

CIS 2033 BASED ON
DEKKING ET AL. A MODERN INTRODUCTION TO PROBABILITY AND STATISTICS. 2007
INSTRUCTOR LONGIN JAN LATECKI
CHAPTER 10:
COVARIANCE AND CORRELATION
1
2
As an example, take g(x, y) = xy for
discrete random variables X and Y with
the joint probability distribution given in the
table. The expectation of XY is computed
as follows:
3
With the rule above we can compute the expectation of a random
variable X with a Bin(n,p)
which can be viewed as sum of Ber(p) distributions:
4
Proof that E[X + Y] = E[X] + E[Y]:
5
Var(X + Y) is generally not equal to Var(X) + Var(Y)
6
If Cov(X,Y) > 0 , then X and Y are positively correlated.
If Cov(X,Y) < 0, then X and Y are negatively correlated.
If Cov(X,Y) =0, then X and Y are
uncorrelated.
Gustavo Orellana
7
8
Now let X and Y be two independent random variables.
Then Cov(X, Y ) = E[XY ] − E[X]E[Y ] = 0.
Hence, then X and Y are uncorrelated.
We proved that if X and Y are two independent random variables,
then they are uncorrelated.
In general, E[XY] is NOT equal to E[X]E[Y].
INDEPENDENT VERSUS UNCORRELATED.
If two random variables X and Y are independent, then
X and Y are uncorrelated.
The converse is not true as we will see on the next slide.
9
Then Cov(X, Y ) = E[XY ] − E[X]E[Y ] = 0 and X and Y are uncorrelated,
but they are dependent.
10
The variance of a random variable with a Bin(n,p) distribution:
11
The covariance changes under a change of units
The covariance Cov(X,Y) may not always be suitable to express the dependence
between X and Y. For this reason, there is a standardized version of the
covariance called the correlation coefficient of X and Y, which remains unaffected
by a change of units and, therefore, is dimensionless.
12
13
Correlation coefficient is also called Pearson
correlation coefficient.
(from Wikipedia) Examples of scatter diagrams with different
values of correlation coefficient.
14
(from Wikipedia) Several sets of (x, y) points, with the correlation coefficient
of x and y for each set. Note that the correlation reflects the non-linearity
and direction of a linear relationship (top row), but not the slope of that
relationship (middle), nor many aspects of nonlinear relationships (bottom).
N.B.: the figure in the center has a slope of 0 but in that case the correlation
coefficient is undefined because the variance of Y is zero.
15