INTRODUCTION - METU | Middle East Technical University

Download Report

Transcript INTRODUCTION - METU | Middle East Technical University

STAT 552
PROBABILITY AND
STATISTICS II
INTRODUCTION
Short review of S551
1
WHAT IS STATISTICS?
• Statistics is a science of collecting data,
organizing and describing it and drawing
conclusions from it. That is, statistics is
a way to get information from data. It is
the science of uncertainty.
2
BASIC DEFINITIONS
• POPULATION: The collection of all items of
interest in a particular study.
•SAMPLE: A set of data drawn from the population;
a subset of the population available for observation
•PARAMETER: A descriptive measure of the
population, e.g., mean
•STATISTIC: A descriptive measure of a sample
•VARIABLE: A characteristic of interest about each
element of a population or sample. 3
STATISTIC
• Statistic (or estimator) is any function of a r.v.
of r.s. which do not contain any unknown
quantity. E.g.
o
o
n
n
n
i 1
n
i 1
i 1
n
 Xi ,  Xi ,  Xi / n, m ii n(Xi ), m ai x(Xi )
are statistics.
 Xi   ,  Xi /  are NOT.
i 1
i 1
• Any observed or particular value of an
estimator is an estimate.
4
Sample Space
• The set of all possible outcomes of an
experiment is called a sample space and
denoted by S.
• Determining the outcomes.
– Build an exhaustive list of all possible
outcomes.
– Make sure the listed outcomes are mutually
exclusive.
5
RANDOM VARIABLES
• Variables whose observed value is determined
by chance
• A r.v. is a function defined on the sample space
S that associates a real number with each
outcome in S.
• Rvs are denoted by uppercase letters, and their
observed values by lowercase letters.
6
DESCRIPTIVE STATISTICS
• Descriptive statistics involves the
arrangement, summary, and presentation of
data, to enable meaningful interpretation, and to
support decision making.
• Descriptive statistics methods make use of
– graphical techniques
– numerical descriptive measures.
7
Types of data – examples
Examples of types of data
Quantitative
Continuous
Blood pressure, height,
weight, age
Discrete
Number of children
Number of attacks of asthma
per week
Categorical (Qualitative)
Ordinal (Ordered categories) Nominal (Unordered
categories)
Grade of breast cancer
Better, same, worse
Disagree, neutral, agree
Sex (Male/female)
Alive or dead
Blood group O, A, B, AB
8
PROBABILITY
SAMPLE
POPULATION
STATISTICAL
INFERENCE
9
• PROBABILITY: A numerical value
expressing the degree of uncertainty
regarding the occurrence of an event. A
measure of uncertainty.
• STATISTICAL INFERENCE: The science
of drawing inferences about the population
based only on a part of the population,
sample.
10
Probability
P
:
Probability
function
S
domain

[0,1]
range
11
THE CALCULUS OF
PROBABILITIES
•
If P is a probability function and A is any
set, then
a. P()=0
b. P(A)  1
c. P(AC)=1  P(A)
12
ODDS
• The odds of an event A is defined by
P ( A)
P ( A)

C
P ( A ) 1  P ( A)
•It tells us how much more likely to see the
occurrence of event A.
13
ODDS RATIO
• OR is the ratio of two odds.
• Useful for comparing the odds under two
different conditions or for two different
groups, e.g. odds for males versus
females.
14
CONDITIONAL PROBABILITY
• (Marginal) Probability: P(A): How likely is it
that an event A will occur when an
experiment is performed?
• Conditional Probability: P(A|B): How will
the probability of event A be affected by
the knowledge of the occurrence or
nonoccurrence of event B?
• If two events are independent, then
P(A|B)=P(A)
15
CONDITIONAL PROBABILITY
P( A  B)
P (A | B) 
P( B)
0  P( A | B)  1
if
P( B)  0
P( AB)  P( A) P( B | A)  P( B) P( A | B)
P( A1 A2 ...An )  P( A1 )P( A2 | A1 )P( A3 | A1, A2 )...P( An | A1,..., An1 )
16
BAYES THEOREM
• Suppose you have P(B|A), but need
P(A|B).
P(A  B) P(B | A)P(A)
P(A | B) 

for P(B)  0
P(B)
P(B)
17
Independence
• A and B are independent iff
– P(A|B)=P(A) or P(B|A)=P(B)
– P(AB)=P(A)P(B)
• A1, A2, …, An are mutually independent iff
P( Ai )   P( Ai ) for every subset j of {1,2,…,n}
i j
i j
E.g. for n=3, A1, A2, A3 are mutually independent
iff P(A1A2A3)=P(A1)P(A2)P(A3) and
P(A1A2)=P(A1)P(A2) and P(A1A3)=P(A1)P(A3) and
P(A2A3)=P(A2)P(A3)
18
DISCRETE RANDOM
VARIABLES
• If the set of all possible values of a r.v. X is
a countable set, then X is called discrete
r.v.
• The function f(x)=P(X=x) for x=x1,x2, …
that assigns the probability to each value x
is called probability density function (p.d.f.)
or probability mass function (p.m.f.)
19
Example
• Discrete Uniform distribution:
1
P(X  x )  ; x  1,2,...,N; N  1,2,...
N
• Example: throw a fair die.
P(X=1)=…=P(X=6)=1/6
20
CONTINUOUS RANDOM
VARIABLES
• When sample space is uncountable
(continuous)
• Example: Continuous Uniform(a,b)
1
f (X) 
ba
a  x  b.
21
CUMULATIVE DENSITY
FUNCTION (C.D.F.)
• CDF of a r.v. X is defined as F(x)=P(X≤x).
22
JOINT DISCRETE
DISTRIBUTIONS
• A function f(x1, x2,…, xk) is the joint pmf for
some vector valued rv X=(X1, X2,…,Xk) iff
the following properties are satisfied:
f(x1, x2,…, xk) 0 for all (x1, x2,…, xk)
and
 ... f x1, x 2 ,...,x k   1.
x1
xk
23
MARGINAL DISCRETE
DISTRIBUTIONS
• If the pair (X1,X2) of discrete random
variables has the joint pmf f(x1,x2), then the
marginal pmfs of X1 and X2 are
f1  x1    f  x1 , x2  and f 2  x2    f  x1 , x2 
x2
x1
24
CONDITIONAL
DISTRIBUTIONS
• If X1 and X2 are discrete or continuous
random variables with joint pdf f(x1,x2),
then the conditional pdf of X2 given X1=x1
is defined by
f x1, x 2 
f x 2 x1  
, x1 such that f x1   0, 0 elsewhere.
f x1 
• For independent rvs,
f  x2 x1   f  x2 .
f  x1 x2   f  x1 .
25
EXPECTED VALUES
Let X be a rv with pdf fX(x) and g(X) be a
function of X. Then, the expected value (or
the mean or the mathematical expectation)
of g(X)
 g  x  f X  x  , if X is discrete
 x
E  g  X    
  g  x  f X  x  dx, if X is continuous

providing the sum or the integral exists, i.e.,
<E[g(X)]<.
26
EXPECTED VALUES
• E[g(X)] is finite if E[| g(X) |] is finite.
 g  x  f X  x < , if X is discrete
 x
E  g  X     
  g  x  f X  x  dx< , if X is continuous

27
Laws of Expected Value and Variance
Let X be a rv and c be a constant.
Laws of Expected
Value
 E(c) = c
 E(X + c) = E(X) + c
 E(cX) = cE(X)
Laws of
Variance
 V(c) = 0
 V(X + c) = V(X)
 V(cX) = c2V(X)
28
EXPECTED VALUE
E   ai X i    ai E  X i .
 i 1
 i 1
k
k
If X and Y are independent,
Eg  X hY   Eg  X EhY 
The covariance of X and Y is defined as
CovX, Y   EX  EX Y  EY 
 E(XY)  E(X)E(Y)
29
EXPECTED VALUE
If X and Y are independent,
Cov X ,Y   0
The reverse is usually not correct! It is only
correct under normal distribution.
If (X,Y)~Normal, then X and Y are
independent iff
Cov(X,Y)=0
30
EXPECTED VALUE
Var X1  X 2   Var X1   Var X 2   2Cov X1 , X 2 
If X1 and X2 are independent,
Var X1  X 2   Var X1   Var X 2 
31
CONDITIONAL EXPECTATION
AND VARIANCE
 yf  y x 
, if X and Y are discrete.

y
E Y x    
  yf  y x dy , if X and Y are continuous.


Var Y x   E Y x   E Y x 
2
2
32
CONDITIONAL EXPECTATION
AND VARIANCE
E E Y X   E Y 
Var (Y)  EX (Var (Y | X))  VarX (E(Y | X))
(EVVE rule)
Proofs available in Casella & Berger (1990), pgs. 154 &
158
33
SOME MATHEMATICAL
EXPECTATIONS
• Population Mean:  = E(X)
• Population Variance:
2
2
2
  Var  X   E  X     E  X    2  0
(measure of the deviation from the population mean)
2



0
• Population Standard Deviation:
• Moments:
k*  E  X k   the k-th moment
k  E  X     the k-th central moment
k
34
The Variance


This measure reflects the dispersion of all the
observations
The variance of a population of size N x1, x2,…,xN
whose mean is  is defined as
2 

2
N
(
x


)
i
i1
N
The variance of a sample of n observations
x1, x2, …,xn whose mean is x is defined as
n
2

(
x

x
)
i
s2  i1
n 1
1  n 2 ( in1 xi ) 2 
s 
xi 
i

n  1  1
n

2
35
MOMENT GENERATING
FUNCTION
The m.g.f. of random variable X is defined as
 e txf ( x )dx if X is cont.
 
all x
tX
M X ( t )  E (e )  
  e txf ( x ) if X is discret e

all x
for t Є (-h,h) for some h>0.
36
Properties of m.g.f.
• M(0)=E[1]=1
• If a r.v. X has m.g.f. M(t), then Y=aX+b
has a m.g.f. ebtM(at)
k
(k )
(k )
th
E
(
X
)

M
(
0
)
where
M
is
the
k
derivative.
•
• M.g.f does not always exists (e.g. Cauchy
distribution)
37
CHARACTERISTIC FUNCTION
The c.h.f. of random variable X is defined as
 e itx f ( x) dx if X is cont.


all x
itX
 X (t )  E (e )  
itx
e
f ( x) if X is discrete


all x
for all real numbers t.
i 2  1, i  1
C.h.f. always exists.
38
Uniqueness
Theorem:
1. If two r.v.s have mg.f.s that exist and are
equal, then they have the same
distribution.
2. If two r.v.s have the same distribution,
then they have the same m.g.f. (if they
exist)
Similar statements are true for c.h.f.
39
SOME DISCRETE PROBABILITY
DISTRIBUTIONS
• Please review: Degenerate, Uniform,
Bernoulli, Binomial, Poisson, Negative
Binomial, Geometric, Hypergeometric,
Extended Hypergeometric, Multinomial
40
SOME CONTINUOUS
PROBABILITY DISTRIBUTIONS
• Please review: Uniform, Normal
(Gaussian), Exponential, Gamma, ChiSquare, Beta, Weibull, Cauchy, LogNormal, t, F Distributions
41
TRANSFORMATION OF RANDOM
VARIABLES
• If X is an rv with pdf f(x), then Y=g(X) is also an
rv. What is the pdf of Y?
• If X is a discrete rv, replace Y=g(X) whereever you
see X in the pdf of f(x) by using the relation
x  g1(y) .
• If X is a continuous rv, then do the same thing,
but now multiply with Jacobian.
• If it is not 1-to-1 transformation, divide the region
into sub-regions for which we have 1-to-1
transformation.
42
CDF method
2x
F
(
x
)

1

e
for x  0
• Example: Let
Consider Y  eX . What is the p.d.f. of Y?
• Solution:
FY ( y)  P(Y  y)  P(e X  y)  P(X  ln y)
 FX (ln y)  1  y  2 for y  1
d
f Y ( y) 
FY ( y)  2 y 3 for y  1
dy
43
M.G.F. Method
• If X1,X2,…,Xn are independent random
variables with
MGFs
M
xi (t), then the
n
MGF of Y   Xi is MY (t)  MX1 (t)...MXn (t)
i 1
44
THE PROBABILITY INTEGRAL
TRANSFORMATION
• Let X have continuous cdf FX(x) and define
the rv Y as Y=FX(x). Then,
Y ~ Uniform(0,1), that is,
P(Y  y) = y, 0<y<1.
• This is very commonly used, especially in
random number generation procedures.
45
SAMPLING DISTRIBUTION
• A statistic is also a random variable. Its
distribution depends on the distribution of
the random sample and the form of the
function Y=T(X1, X2,…,Xn). The probability
distribution of a statistic Y is called the
sampling distribution of Y.
SAMPLING FROM THE NORMAL
DISTRIBUTION
Properties of the Sample Mean and
Sample Variance
• Let X1, X2,…,Xn be a r.s. of size n from a
N(,2) distribution. Then,
2
a) X and S are independent rvs.
b) X ~ N   ,  / n 
2
n  1 S

2
c)
~  n1
2
2

47
SAMPLING FROM THE NORMAL
DISTRIBUTION
If population variance is unknown, we use sample
variance:
48
SAMPLING FROM THE NORMAL
DISTRIBUTION
• The F distribution allows us to compare
the variances by giving the distribution of
S X2 / SY2 S X2 /  X2
 2 2 ~ Fn1,m1
2
2
 X /  Y SY /  Y
• If X~Fp,q, then 1/X~Fq,p.
• If X~tq, then X2~F1,q.
49
CENTRAL LIMIT THEOREM
If a random sample is drawn from any population, the
sampling distribution of the sample mean is X
approximately normal for a sufficiently large sample
size. The larger the sample size, the more closely the
sampling distribution of
will resemble a normal
distribution.
Random Sample
(X1, X2, X3, …,Xn)
X
Random Variable
(Population) Distribution
X
as n  
Sample Mean
Distribution
50
Sampling Distribution of the
Sample Mean
X  
2


2
X 
or  X 
n
n
If X is normal, X is normal.
X 
2
X ~ N(  , / n )  Z 
~ N( 0,1 )
/ n
If X is non-normal, X is approximately
normally distributed for sample size
greater than or equal to 30.
51