Random Variables & Entropy: Examples

Download Report

Transcript Random Variables & Entropy: Examples

Random Variables & Entropy: Extension and Examples

Brooks Zurn EE 270 / STAT 270 FALL 2007

Overview

• • • Density Functions and Random Variables Distribution Types Entropy

Density Functions

• PDF vs. CDF 1 0,9 0,8 0,3 0,2 0,1 0 0,7 0,6 0,5 0,4 PDF CDF 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 – – – PDF shows probability of each size bin CDF shows cumulative probability for all sizes up to and including current bin This data shows the normalized, relative size of a rodent as seen from an overhead camera for 8 behaviors

Markov & Chebyshev Inequalities

• • • • What’s the point?

Setting a maximum limit on probability This limits the search space for a solution – When looking for a needle in a haystack, it helps to have a smaller haystack.

Can use limit to determine the necessary sample size

Markov & Chebyshev Inequalities

• Example: Mean height of a child in a kindergarten class is 3’6”. (Leon-Garcia text, p. 137 – see end of presentation) – Using Markov’s inequality, the probability of a child being taller than 9 feet is <= 42/108 = .389.

 there will be fewer than 39 students over 9 feet tall in a class of 100 students.  Also, there will be NO LESS THAN 41 students who are under 9’ tall.

-Using Chebyshev’s inequality (and assuming the variance = 1 foot) the probability of a child being taller than 9 feet is <= 12 2 /108 2 = .0123.

 there will be no more than 2 students taller than 9’ in a class of 100 students. (this is also consistent with Markov’s Inequality).  Also, there will be NO LESS THAN 98 students under 9’ tall.

This gives us a basic idea of how many student heights we need to measure to rule out the possibility that we have a 9’ tall student… SAMPLE SIZE!!

Markov’s Inequality

For a random variable X >= 0,

P

{

X

c

} 

E

[

X

]

c

Derivation:

E[x]=, where f x (x)=P[x-e/2£X£x+e/2]/e

Assuming this also holds for X = a, because this is a continuous integral.

Therefore

Markov’s Inequality

for c > 0, the number of values of x > c is infinite, therefore the value of c will stay constant while x continues to increase.

Markov’s Inequality

References: Lefebvre text.

Chebyshev’s Inequality

P

{

Y

E

[

Y

]  Derivation (INCOMPLETE):

c

}  

c

2 2 , 

c

 0

Chebyshev’s Inequality

As before, c how do f

y

2 is constant and (Y-E[Y]) 2 continues to increase. But, |Y-E[Y]| and f

Y (Y-E[Y]) 2

relate?

(|Y-E[Y]|) 2

= (Y-E[Y])

2

As long as Y – E[Y] is >= 1, then u

2

holds, as per Markov’s Inequality. will be > u and the inequality Note: this is not a rigorous proof, and cases for which Y – E[Y] < 1 are not discussed.

Reference: Lefebvre text.

Note

• • These both involve the Central Limit Theorem, which is derived in the Leon-Garcia text on p. 287.

Central Limit Theorem states that the CDF of a normalized sequence of n random variables approaches the CDF of a Gaussian random variable. (p. 280)

• Entropy – What is it?

– Used in…

Overview

Entropy

• What is it? – According to Jorge Cham (PhD Comics),

Entropy

• • “Measure of uncertainty in a random experiment” Reference: Leon-Garcia Text Used in information theory – Message transmission (for example, Lathi text p. 682) – Decision Tree ‘Gain Criterion’ • • Leon-Garcia text p. 167 ID3, C4.5, ITI, etc. by J. Ross Quinlan and Paul Utgoff • Note: NOT same as the Gini index used as a splitting criterion by the CART tree method (Breiman et al, 1984).

Entropy

• • • ID3 Decision Tree: Expected Information for a Binary Tree

E

(

A

) 

j q

  1

s

1

j

where the entropy I is 

s

2

j s

 ...

s n j I

(

S

1

j

,

S

2

j

,...,

S n j

)

I

(

S

1 ,

S

2 ,...,

S n

)  

n

 

p i

log 2

p i i

1 E(A) is the average information needed to classify A.

ITI (Incremental Tree Inducer): Based on ID3 and its successor, C4.5.

-Uses a gain ratio metric to improve function for certain cases

Entropy

• ITI Decision Tree for Rodent Behaviors – ITI is an extension of ID3 Reference: ‘Rodent Data’ paper.

Distribution Types

• Continuous Random Variables – Normal (or Gaussian) Distribution – Uniform Distribution – Exponential Distribution – Rayleigh Random Variable • Discrete (‘counting’) Random Variables – Binomial Distribution – Bernoulli and Geometric Distributions – Poisson Distribution

• • •

Poisson Distribution

n P

{

X

n

} 

e

  and

n

!

P X

(

z

) 

e

 

n

   0 ( 

z

)

n n

!

e

 (

z

 1 ) Number of events occurring in one time unit, time between events is exponentially distributed with mean 1/a.

Gives a method for modeling completely random, independent events that occur after a random interval of time. (Leon-Garcia p. 106) Poisson Dist. can model a sequence of Bernoulli trials (Leon-Garcia p. 109) – Bernoulli gives the probability of a single coin toss.

References: Kao text, Leon-Garcia text.

Poisson Distribution

• http://en.wikipedia.org/wiki/Image:Poisson_distribution_PMF.png

References

• • • • • • • Lefebvre Text: – Applied Stochastic Processes, Mario Lefebvre. New York, NY: Springer., 2003 Kao Text: – An Introduction to Stochastic Processes, Edward P. C. Kao. Belmont, CA, USA: Duxbury Press at Wadsworth Publishing Company, 1997.

Lathi Text: – Modern Digital and Analog Communication Systems, 3 rd Oxford: Oxford University Press, 1998.

ed., B. P. Lathi. New York, Entropy-Based Decision Trees: – ID3: P. E. Utgoff, "Incremental induction of decision trees.," Machine Learning, vol. 4, pp. 161-186, 1989.

– C4.5: J. R. Quinlan, C4.5: Programs for machine learning, 1st ed. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1993.

– ITI: P. E. Utgoff, N. C. Berkman, and J. A. Clouse, "Decision tree induction based on efficient tree restructuring.," Machine Learning, vol. 29, pp. 5-44, 1997.

Other Decision Tree Methods: – CART: L. Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone, Classification and Regression Trees. Belmont, CA: Wadsworth. 1984.

Rodent Data: – J. Brooks Zurn, Xianhua Jiang, Yuichi Motai. Video-Based Tracking and Incremental Learning Applied to Rodent Behavioral Activity under Near-Infrared Illumination. To appear: IEEE Transactions on Instrumentation and Measurement, December 2007 or February 2008. Poisson Distribution Example: – http://en.wikipedia.org/wiki/Image:Poisson_distribution_PMF.png

Questions?