GEOSTATISTICS - RSLAB-NTU

Download Report

Transcript GEOSTATISTICS - RSLAB-NTU

STATISTICS
Random Variables and
Probability Distributions
Professor Ke-Sheng Cheng
Department of Bioenvironmental Systems Engineering
National Taiwan University
Definition of random variable (RV)
• For a given probability space (  ,A, P[]), a
random variable, denoted by X or X(), is a
function with domain  and counterdomain the
real line. The function X() must be such that the
set Ar, denoted by A r   : X ( )  r , belongs
to A for every real number r.
• Unlike the probability which is defined on the
event space, a random variable is defined on
the sample space.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
2
Random
experiment
P { 1 ,  2 } 
Sample
space
Event
space
Probability
space
is defined whereas X { 1 ,  2 }  is not defined.
P  X    r   P  A r   P  : X ( )  r 
P { 1 ,  2 }   P  X  X ( 1 ) or X  X ( 2 ) 
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
3
Cumulative distribution function
(CDF)
• The cumulative distribution function of a
random variable X, denoted by F X () , is
defined to be
F X ( x )  P [ X  x ]  P { : X ( )  x } 
x  R
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
4
• Consider the experiment of tossing two fair
coins. Let random variable X denote the
number of heads. CDF of X is
 0

 0 . 25
FX ( x)  
 0 . 75
 1

7/17/2015
x  0
0  x 1
1 x  2
2  x
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
5
F X ( x )  0 . 25 I [ 0 ,1 ) ( x )  0 . 75 I [1, 2 ) ( x )  I [ 2 ,  ) ( x )
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
6
Indicator function or indicator
variable
• Let  be any space with points  and A any
subset of . The indicator function of A,
denoted by I A () , is the function with domain
 and counterdomain equal to the set
consisting of the two real numbers 0 and 1
defined by
1
I A ( )  
0
7/17/2015
if   A
if   A
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
7
Discrete random variables
• A random variable X will be defined to be discrete if
the range of X is countable.
• If X is a discrete random variable with values
x 1 , x 2 ,  , x n ,  , then the function denoted by
f X () and defined by
 P[ X  x j ]
f X ( x)  
0

if x  x j , j  1, 2 ,  , n , 
if x  x j
is defined to be the discrete density function of X.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
8
Continuous random variables
• A random variable X will be defined to be
f X () such
continuous if there
exists
a
function
x
that F X ( x )    f X ( u ) du for every real number x.
• The function f X () is called the probability
density function of X.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
9
Properties of a CDF
F X (  ) 
lim
FX ( x)  0
lim
FX ( x)  1
x  
F X (  ) 
x  
F X ( a )  F X ( b ) for a  b
F X () is continuous from the right, i.e.
lim
FX ( x  h)  FX ( x)
0 h 0
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
10
Properties of a PDF
f X (x)  0



7/17/2015
x  R
f X (x)  1
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
11
Example 1
• Determine which of the following are valid
distribution functions:
1  [ e  2 x / 2 ] x  0
FX ( x )  
2x
x0
 e /2
FX ( x ) 
7/17/2015
x
a
u ( x  a )  u ( x  2 a ) 
1 x  0
u ( x)  
0 x  0
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
12
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
13
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
14
Example 2
• Determine the real constant a, for arbitrary real
constants m and 0 < b, such that
f X ( x )  ae
 xm /b
x  R
is a valid density function.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
15
• Function f X ( x ) is symmetric about m.



f X ( x )dx  2 

ae
( xm ) /b
dx
m
 2 ab 

e
y
dy  2 ab  1
0
a  1 / 2b
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
16
Characterizing random variables
• Cumulative distribution function
• Probability density function
– Expectation (expected value)
– Variance
– Moments
– Quantile
– Median
– Mode
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
17
Expectation of a random variable
• The expectation (or mean, expected value) of
X, denoted by  X or E(X) , is defined by:
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
18
Rules for expectation
• Let X and Xi be random variables and c be any
real constant.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
19
X ( t )  25 sin(  t )
7/17/2015
E  X (t )   ?
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
20
Variance of a random variable
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
21
•  X  Var ( X )  0 is called the standard
deviation of X.
Var [ X ]  
2
X
  
 E X
2
 E [ X ]  ( E [ X ])
2
2
2
X
• Variance characterizes the dispersion of data
with respect to the mean. Thus, shifting a
density function does not change its variance.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
22
Rules for variance
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
23
• Two random variables are said to be
independent if knowledge of the value
assumed by one gives no clue to the value
assumed by the other.
• Events A and B are defined to be independent
if and only if
P [ AB ]  P  A  B   P  A P  B 
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
24
Moments and central moments of a
random variable
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
25
Properties of moments
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
26
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
27
Quantile
• The qth quantile of a random variable X,
denoted by  q , is defined as the smallest
number  satisfying F X ( )  q .
Discrete Uniform
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
28
Median and mode
• The median of a random variable is the 0.5th
quantile, or  0 .5 .
• The mode of a random variable X is defined as
the value u at which f X (u ) is the maximum
of f X () .
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
29
Note: For a positively skewed distribution, the
mean will always be the highest estimate of
central tendency and the mode will always be
the lowest estimate of central tendency
(assuming that the distribution has only one
mode). For negatively skewed distributions,
the mean will always be the lowest estimate of
central tendency and the mode will be the
highest estimate of central tendency. In any
skewed distribution (i.e., positive or negative)
the median will always fall in-between the
mean and the mode.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
30
Moment generating function
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
31
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
32
Usage of MGF
• MGF can be used to express moments in terms
of PDF parameters and such expressions can
again be used to express mean, variance,
coefficient of skewness, etc. in terms of PDF
parameters.
• Random variables of the same MGF are
associated with the same type of probability
distribution.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
33
• The moment generating function of a sum of
independent random variables is the product
of the moment generating functions of
individual random variables.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
34
Expected value of a function of a random
variable
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
35
• If Y=g(X)
E [ g ( X )] 
 E Y  




g ( x ) f X ( x ) dx


yf Y ( y ) dy
Var [ X ]  E [( X   X ) ]
2

7/17/2015



( x   X ) f X ( x ) dx
2
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
36
Y
Y=g(X)
E [ g ( X )] 
y
 E Y  
x1
7/17/2015
x2
x3




g ( x ) f X ( x ) dx


yf Y ( y ) dy
X
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
37
Theorem
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
38
Chebyshev Inequality
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
39
• The Chebyshev inequality gives a bound,
which does not depend on the distribution of X,
for the probability of particular events
described in terms of a random variable and its
mean and variance.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
40
•
•
•
•
•
•
•
Probability density functions of
discrete random variables
Discrete uniform distribution
Bernoulli distribution
Binomial distribution
Negative binomial distribution
Geometric distribution
Hypergeometric distribution
Poisson distribution
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
41
Discrete uniform distribution
1

f X ( x; N )   N

 0
x  1, 2 ,  , N
1

N
otherwise
I 1, 2 , , N  ( x )
N ranges over the possible integers.
E [ X ]  ( N  1) / 2
Var [ X ]  ( N  1) / 12
2
N
m X (t ) 
e
j 1
7/17/2015
jt
1
N
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
42
Bernoulli distribution
 p x (1  p ) 1  x
f X ( x; p )  
0
x  0 or 1
 p (1  p )
x
otherwise
1 x
I  0 ,1  ( x )
0  p 1
1-p is often denoted by q.
E[ X ]  p
Var [ X ]  pq
m X ( t )  pe  q
t
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
43
Binomial distribution
• Binomial distribution represents the probability
of having exactly x success in n independent
and identical Bernoulli trials.
 n  x
nx
   p (1  p )
f X ( x; n, p )   x
 

0

E [ X ]  np
x  0 ,1,  , n
otherwise
n x
nx
   p (1  p )
I 0 ,1, , n  ( x )
x
Var [ X ]  np (1  p )  npq
m X ( t )  ( q  pe )
t
7/17/2015
n
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
44
Negative binomial distribution
• Negative binomial distribution represents the
probability of achieving the r-th success in x
independent and identical Bernoulli trials.
• Unlike the binomial distribution for which the number
of trials is fixed, the number of successes is fixed and
the number of trials varies from experiment to
experiment. The negative binomial random variable
represents the number of trials needed to achieve the rth success.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
45
 x  1
xr
r


f X ( x; r , p )  
(1  p )
p

 r  1
r  1, 2 ,  ; x  r , r  1, 
E[ X ]  r / p
Var [ X ]  rq / p
2
m X ( t )  ( pe ) /( 1  qe )
t
7/17/2015
r
t
r
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
46
Geometric distribution
• Geometric distribution represents the
probability of obtaining the first success in x
independent and identical Bernoulli trials.
f X ( x ; p )  (1  p )
E[ X ]  1 / p
x 1
x  1, 2 ,3, 
p
Var [ X ]  q / p
2
m X ( t )  ( pe ) /( 1  qe )
t
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
t
47
Hypergeometric distribution
 K
 
  x
f X ( x; M , K , n )  



 M
 
 n
M

 n
0
K

 x 
for x  0 ,1,  , n



otherwise
where M is a positive integer, K is a nonnegative
integer that is at most M, and n is a positive
integer that is at most M.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
48
• Let X denote the number of defective products
in a sample of size n when sampling without
replacement from a box containing M products,
K of which are defective.
E [ X ]  nK / M
Var [ X ]  n
K
M
7/17/2015

M K
M
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

M n
M 1
49
Poisson distribution
• The Poisson distribution provides a realistic model for
many random phenomena for which the number of
occurrences within a given scope (time, length, area,
volume) is of interest. For example, the number of
fatal traffic accidents per day in Taipei, the number of
meteorites that collide with a satellite during a single
orbit, the number of defects per unit of some material,
the number of flaws per unit length of some wire, etc.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
50
f X ( x;  ) 

e


x!
e

x

x
x  0 ,1, 2 , 
x!
I 0,1,   ( x )
 0
E[ X ]  
Var [ X ]  
m X (t )  e
 ( e 1)
7/17/2015
t
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
51
Assume that we are observing the occurrence
of certain happening in time, space, region or
length. Also assume that there exists a positive
quantity  which satisfies the following
properties:
1.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
52
2.
3.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
53
The probability of success (occurrence) in each trial.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
54
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
55
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
56
f X ( x;  ) 
7/17/2015
e


x
x  0 ,1, 2 , 
x!
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
57
1
0.8
0.6
0.4
0.2
0
0
5
alpha=0.05
7/17/2015
10
15
20
alpha=0.1
25
30
35
alpha=0.2
40
45
50
alpha=0.5
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
58
Comparison of Poisson and
Binomial distributions
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
59
• Example
Suppose that the average number of telephone calls
arriving at the switchboard of a company is 30 calls
per hour.
(1) What is the probability that no calls will arrive in
a 3-minute period?
(2) What is the probability that more than five calls
will arrive in a 5-minute interval?
Assume that the number of calls arriving during any
time period has a Poisson distribution.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
60
Assuming time is measured in minutes
Poisson distribution is NOT
an appropriate choice.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
61
Assuming time is measured in seconds
Poisson distribution is
an appropriate choice.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
62
• The first property provides the basis for
transferring the mean rate of occurrence
between different observation scales.
• The “small time interval of length h” can be
measured in different observation scales.
• h  h  represents the time length measured in
scale of  i .
•  i is the mean rate of occurrence when
observation scale  i is used.
i
7/17/2015
i
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
63
• If the first property holds for various
observation scales, say  1 ,  ,  n , then it
implies the probability of exactly one
happening in a small time interval h can be
approximated by
 1h   2 h     n h
1
2
n
 h



 h

 1 h









2
n



1
2 
n 



 p  1
• The probability of more than one happenings
in time interval h is negligible.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
64
• probability that more than five calls will arrive
in a 5-minute interval
 1  P0 ( 5 )  P1 ( 5 )  P2 ( 5 )  P3 ( 5 )  P4 ( 5 )  P5 ( 5 )
 0 . 042021 .
• Occurrences of events which can be
characterized by the Poisson distribution is
known as the Poisson process.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
65
Probability density functions of
continuous random variables
• Uniform or rectangular distribution
• Normal distribution (also known as the Gaussian
distribution)
• Exponential distribution (or negative exponential
distribution)
• Gamma distribution (Pearson Type III)
• Chi-squared distribution
• Lognormal distribution
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
66
Uniform or rectangular distribution
f X ( x; a , b ) 
1
(b  a )
I [ a ,b ] ( x )
E[ X ]  (a  b) / 2
Var [ X ]  ( b  a ) / 12
2
m X (t ) 
7/17/2015
e
bt
e
at
(b  a )t
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
67
PDF of U(a,b)
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
68
Normal distribution (Gaussian distribution)
f X ( x;  , ) 
1
2 
e
1  x 
 
2   2
2



E[ X ]  
Var [ X ]  
m X (t )  e
7/17/2015
2
 t  2t 2 / 2
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
69
Z
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
70
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
71
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
72
Z 
X~N(μ1, σ1)
7/17/2015
X  1
1
Z~N(0,1)

Y  2
2
Y~N(μ2, σ2)
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
73
Commonly used values of normal distributions
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
74
Exponential distribution
(negative exponential distribution)
f X ( x;  )   e
 x
E[ X ]  1 / 
Var [ X ]  1 / 
m X (t ) 
7/17/2015

 t
I [ 0 , ) ( x ) ,   0 .
Mean rate of occurrence in a Poisson
process.
2
for t  
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
75
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
76
Gamma distribution
 x 
f X ( x; ,  ) 
 
 ( )   
1
 1
e
 x /
I [ 0 , ) ( x ),   0 ,   0 .
E [ X ]  
Var [ X ]   
2
m X ( t )  (1   t )

for t  1 /  .
1 /  represents the mean rate of occurrence in a Poisson process.
1 /  is equivalent to  in the exponential density.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
77
• The exponential distribution is a special case
of gamma distribution with   1 .
• The sum of n independent identically
distributed exponential random variables with
parameter  has a gamma distribution with
parameters   n and   1 /  .
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
78
Pearson Type III distribution (PT3)
x 
f X ( x) 


( )   
1
 


2
   
 
 1
e
 x  





,
  x  
2
   

,  and  are the mean, standard deviation
and skewness coefficient of X, respectively.
It reduces to Gamma distribution if  = 0.

7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
79
• The Pearson type III distribution is widely
applied in stochastic hydrology.
• Total rainfall depths of storm events can be
characterized by the Pearson type III
distribution.
• Annual maximum rainfall depths are also often
characterized by the Pearson type III or logPearson type III distribution.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
80
Chi-squared distribution
 x
f X ( x; k ) 
 
2 (k / 2)  2 
1
( k / 2 ) 1
e
x/2
I [ 0 ,  ) ( x ) , k  1,2,  .
E [ X ]  k Var [ X ]  2 k
m X ( t )  (1  2 t )
k / 2
for t  1 / 2 .
• The chi-squared distribution is a special
case of the gamma distribution with
  k / 2 and   2 .
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
81
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
82
Log-Normal Distribution
Log-Pearson Type III Distribution (LPT3)
• A random variable X is said to have a lognormal distribution if Log(X) is distributed
with a normal density.
• A random variable X is said to have a LogPearson type III distribution if Log(X) has a
Pearson type III distribution.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
83
Lognormal distribution
f X ( x;  , ) 
E[ X ]  e
  (
1
2 
x
2
2



I ( 0 , ) ( x )
/ 2)
Var [ X ]  e
7/17/2015
e
1   ln x  
 
2 
2
2   2
2
e
2  
2
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
84
Approximations between random variables
•
•
•
Approximation of binomial distribution by
Poisson distribution
Approximation of binomial distribution by
normal distribution
Approximation of Poisson distribution by
normal distribution
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
85
Approximation of binomial distribution
by Poisson distribution
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
86
Approximation of binomial
distribution by normal distribution
• Let X have a binomial distribution with
parameters n and p. If n   , then for fixed a<b,


X  np
P a 
 b   P np  a npq  X  np  b npq   ( b )   ( a )
npq




 ( x)
is the cumulative distribution function of the
standard normal distribution.
It is equivalent to say that as n approaches infinity X
can be approximated by a normal distribution with
mean np and variance npq.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
87
Approximation of Poisson distribution
by normal distribution
• Let X have a Poisson distribution with
parameter . If    , then for fixed a<b


X 


P a 
 b   P   a   X    b    (b )   ( a )



• It is equivalent to say that as  approaches
infinity X can be approximated by a normal
distribution with mean  and variance .
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
88
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
89
Example
• Suppose that two fair dice are tossed 600
times. Let X denote the number of times that a
total of 7 dots occurs. What is the probability
that 90  X  110 ?
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
90
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
91
Transformation of random variables
• [Theorem] Let X be a continuous RV with
density fx. Let Y=g(X), where g is strictly
monotonic and differentiable. The density for
Y, denoted by fY, is given by
fY ( y )  f X ( g
7/17/2015
1
( y ))
dg
1
( y)
.
dy
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
92
• Proof: Assume that Y=g(X) is a strictly
monotonic increasing function of X.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
93
Example
• Let X be a gamma random variable with
 x 
f X ( x; ,  ) 
 
 ( )   
1
X
Let Y 



X

e
 x /
I [ 0 , ) ( x ),   0 ,   0 .
X   XY ,
,
dX




 1
 1
e
 Y





dY
X
 Y

fY ( y ) 

  (  ) 
1
 1






7/17/2015




 Y



1




 Y


e



 1  
( ) 1
Y is also a gamma random variable with scale parameter 1
parameter .
1

X

and shape
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
94
Definition of the location parameter
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
95
Example of location parameter
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
96
Definition of the scale parameter
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
97
Example of scale parameter
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
98
Simulation
• Given a random variable X with CDF FX(x), there
are situations that we want to obtain a set of n
random numbers (i.e., a random sample of size n)
from FX(.) .
• The advances in computer technology have made it
possible to generate such random numbers using
computers. The work of this nature is termed
“simulation”, or more precisely “stochastic
simulation”.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
99
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
100
Pseudo-random number generation
• Pseudorandom number generation (PRNG) is
the technique of generating a sequence of
numbers that appears to be a random sample
of random variables uniformly distributed
over (0,1).
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
101
• A commonly applied approach of PRNG starts with
an initial seed and the following recursive algorithm
(Ross, 2002)
x n  ax n  1 modulo m
where a and m are given positive integers, and the
above equation means that ax n  1 is divided by m and
the remainder is taken as the value of x n .
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
102
• The quantity x n / m is then taken as an
approximation to the value of a uniform (0,1)
random variable.
• Such algorithm will deterministically generate a
sequence of values and repeat itself again and again.
Consequently, the constants a and m should be
chosen to satisfy the following criteria:
– For any initial seed, the resultant sequence has the “appearance” of
being a sequence of independent uniform (0,1) random variables.
– For any initial seed, the number of random variables that can be
generated before repetition begins is large.
– The values can be computed efficiently on a digital computer.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
103
• A guideline for selection of a and m is that m
be chosen to be a large prime number that
can be fitted to the computer word size. For a
31
32-bit word computer, m = 2  1 and a = 7 5
result in desired properties (Ross, 2002).
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
104
Simulating a continuous random variable
• probability integral transformation
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
105
The cumulative distribution function of a continuous
random variable is a monotonic increasing function.
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
106
Example
f ( x)  e
F ( x) 

x
 x
f ( u ) du  1  e
x
U
0
X 
 ln( 1  U )


 ln V

,
V ~ iid U ( 0 ,1)
• Generate a random sample {v1 , v 2 ,  , v n } of
random variable V which has a uniform
density over (0, 1).
• Convert {v1 , v 2 ,  , v n } to { x , x ,  , x } using the
above V-to-X transformation.
1
7/17/2015
2
n
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
107
Random number generation in R
• R commands for stochastic simulation (for
normal distribution
– pnorm – cumulative probability
– qnorm – quantile function
– rnorm – generating a random sample of a specific
sample size
– dnorm – probability density function
For other distributions, simply change the distribution names.
For examples, (punif, qunif, runif, and dunif) for uniform
distribution and (ppois, qpois, rpois, and dpois) for Poisson
distribution.
7/17/2015
Lab for Remote Sensing Hydrology and Spatial Modeling Dept of
Bioenvironmental Systems Engineering, NTU
108
Generating random numbers of
discrete distribution in R
• Discrete uniform distribution
– R does not provide default functions for random
number generation for the discrete uniform
distribution.
– However, the following functions can be used for
discrete uniform distribution between 1 and k.
•
•
•
•
7/17/2015
rdu<-function(n,k) sample(1:k,n,replace=T) # random number
ddu<-function(x,k) ifelse(x>=1 & x<=k & round(x)==x,1/k,0) # density
pdu<-function(x,k) ifelse(x<1,0,ifelse(x<=k,floor(x)/k,1))
# CDF
qdu <- function(p, k) ifelse(p <= 0 | p > 1, return("undefined"),
ceiling(p*k))
# quantile
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
109
– Similar, yet more flexible, functions are defined as
follows
• dunifdisc<-function(x, min=0, max=1) ifelse(x>=min & x<=max &
round(x)==x, 1/(max-min+1), 0)
>dunifdisc(23,21,40)
>dunifdisc(c(0,1))
• punifdisc<-function(q, min=0, max=1) ifelse(q<min, 0, ifelse(q>max, 1,
floor(q-min+1)/(max-min+1)))
>punifdisc(0.2)
>punifdisc(5,2,19)
• qunifdisc<-function(p, min=0, max=1) floor(p*(max-min+1))+min
>qunifdisc(0.2222222,2,19)
>qunifdisc(0.2)
• runifdisc<-function(n, min=0, max=1) sample(min:max, n, replace=T)
>runifdisc(30,2,19)
>runifdisc(30)
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
110
• Binomial distribution
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
111
• Negative binomial distribution
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
112
• Geometric distribution
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
113
• Hypergeometric distribution
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
114
• Poisson distribution
7/17/2015
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
115
An example of stochastic simulation
• The travel time from your home (or dormitory)
to NTU campus may involve a few factors:
– Walking to bus stop (stop for traffic lights,
crowdedness on the streets, etc.)
– Transportation by bus
– Stop by 7-11 or Starbucks for breakfast (long queue)
– Walking to campus
7/17/2015
Lab for Remote Sensing Hydrology and Spatial Modeling Dept of
Bioenvironmental Systems Engineering, NTU
116
X 1 ~ N (15 , 
2
 36 )
X 2 ~ Gamma distribution with mean 30 minutes and
standard deviation 10 minutes.
X 3 ~ Exponential distribution with a mean of 20 minutes.
X 4 ~ N (10 , 
2
 25 )
All Xi’s are independently distributed.
• If you leave home at 8:00 a.m. for a class
session of 9:10, what is the probability of
being late for the class?
Y  X1  X 2  X 3  X 4
7/17/2015
Lab for Remote Sensing Hydrology and Spatial Modeling Dept of
Bioenvironmental Systems Engineering, NTU
117