SAM Estimation Using Maximum Entropy Methods

Download Report

Transcript SAM Estimation Using Maximum Entropy Methods

National Accounts and SAM
Estimation Using
Cross-Entropy Methods
Sherman Robinson
Estimation Problem
• Partial equilibrium models such as IMPACT
require balanced and consistent datasets the
represent disaggregated production and
demand by commodity
• Estimating such a dataset requires an
efficiency method to incorporate and
reconcile information from a variety of
sources
2
Primary Data Sources for IMPACT
Base Year
• FAOSTAT for country totals for:
– Production: Area, Yields and Supply
– Demand: Total, Food, Intermediate, Feed, Other
Demands
– Trade: Exports, Imports, Net Trade
– Nutrition: Calories per capita, calories per kg of
commodity
• AQUASTAT for country irrigation and rainfed
production
• SPAM pixel level estimation of global allocation of
production
3
Estimating a Consistent and
Disaggregated Database
Estimate
IMPACT Country
Database
• FAOSTAT
Estimate
Technology
Disaggregated
Production
• IMPACT Country Database
• FAO AQUASTAT
Estimate
Geographic
Disaggregated
Production
• Technology
Disaggregated
• SPAM
4
Bayesian Work Plan
Source Data
(FAO, SPAM)
Feedback to
data source
Priors on values and
estimation errors of
production, demand, and
trade
New information to
correct identified
problems
Estimation by CrossEntropy Method
Check results against
priors and identify
potential data problems
5
Information Theory Approach
• Goal is to recover parameters and data we
observe imperfectly. Estimation rather than
prediction.
• Assume very little information about the error
generating process and nothing about the
functional form of the error distribution.
• Very different from standard statistical
approaches (e.g., econometrics).
– Usually have lots of data
6
Estimation Principles
• Use all the information you have.
• Do not use or assume any information you do
not have.
• Arnold Zellner: “Efficient Information
Processing Rule (IPR).”
• Close links to Bayesian estimation
7
Information Theory
• Need to be flexible in incorporating
information in parameter/data estimation
– Lots of different forms of information
• In classic statistics, “information” in a data set
can summarized by the moments of the
distribution of the data
– Summarizes what is needed for estimation
• We need a broader view of “estimation” and
need to define “information”
8
An analogy from physics
initial state
of motion.
final state of
motion.
Force

 dp
Force is whatever induces a change of motion: F 
dt
9
Inference is dynamics as well
old beliefs
new beliefs
information
“Information” is what induces a change in
rational beliefs.
10
Information Theory
• Suppose an event E will occur with probability
p. What is the information content of a
message stating that E occurs?
• If p is “high”, event occurrence has little
“information.” If p is low, event occurrence is a
surprise, and contains a lot of information
– Content of the message is not the issue: amount,
not meaning, of information
11
Information Theory
• Shannon (1948) developed a formal measure of
“information content” of the arrival of a message
(he worked for AT&T)
h( p )  log(1 / p )
IF p  1 then h( p )  0
IF p  0 then h( p )  
12
Information Theory
• For a set of events, the expected information
content of a message before it arrives is the
entropy measure:
n
n
k 1
k 1
H ( p )   pk h( pk )   pk log( pk )
and
p
k
1
k
13
Claude Shannon
14
E.T. Jaynes
• Jaynes proposed using the Shannon entropy
measure in estimation
• Maximum entropy (MaxEnt) principle:
– Out of all probability distributions that are
consistent with the constraints, choose the one
that has maximum uncertainty (maximizes the
Shannon entropy metric)
• Idea of estimating probabilities (or frequencies)
– In the absence of any constraints, entropy is
maximized for the uniform distribution
15
E.T. Jaynes
16
Estimation With a Prior
• The estimation problem is to estimate a set of
probabilities that are “close” to a known prior
and that satisfy various known moment
constraints.
• Jaynes suggested using the criterion of
minimizing the Kullback-Leibler “cross
entropy” (CE) “divergence” between the
estimated probabilities and the prior.
17
Cross Entropy Estimation
Minimize:
 pk 
k pk log  p   k pk  log pk  log pk 
 k
where p is the prior probability.
“Divergence”, not “distance”. Measure is not symmetric
and does not satisfy the triangle inequality. It is not a
“norm”.
18
MaxEnt vs Cross-Entropy
• If the prior is specified as a uniform distri-bution,
the CE estimate is equivalent to the MaxEnt
estimate
• Laplace’s Principle of Insufficient Reason: In the
absence of any information, you should choose
the uniform distribution, which has maximum
uncertainty
– Uniform distribution as a prior is an admission of
“ignorance”, not knowledge
19
Cross Entropy Measure
• Two kinds of information
– Prior distribution of the probabilities
– Moments of the distribution
• Can know any moments
– Can also specify inequalities
– Moments with error will be considered
– Summary statistics such as quantiles
20
Cross-Entropy Measure
Minimize
 pk 
pk ln 


k 1
 pk 
subject to constraints (information) about moments
K
K
p
k 1
k
 xt ,k  yt
and the adding-up constraint (finite distribution)
k
p
k 1
k
1
21
Lagrangian
 pk 
L   pk ln   
k 1
 pk 
T
K


t  yt   pk  xt ,k  

t 1
k 1


K


 1   pk 
 k 1 
K
22
First Order Conditions
T
ln pk  ln pk  1   t  xt ,k    0
t 1
K
yt   pk  xt ,k  0
k 1
K
1   pk  0
k 1
23
Solution
pk


pk 
exp   t  xt ,k 
(1 , 2 ,..., T )
 t 1

T
where


    pk exp   t  xt ,k 
k 1
 t 1

 
K
T
24
Cross-Entropy (CE) Estimates
• Ω is called the “partition function”.
• Can be viewed as a limiting form (nonparametric) of a Bayesian estimator,
transforming prior and sample information
into posterior estimates of probabilities.
• Not strictly Bayesian because you do not
specify the prior as a frequency function, but a
discrete set of probabilities.
25
From Probabilities to Parameters
• From information theory, we now have a way
to use “information” to estimate probabilities
• But in economics, we want to estimate
parameters of a model or a “consistent” data
set
• How do we move from estimating
probabilities to estimating parameters and/or
data?
26
Types of Information
• Values:
– Areas, production, demand, trade
• Coefficients: technology
– Crop and livestock yields
– Input-output coefficients for processed
commodities (sugar, oils)
• Prior Distribution of measurement error:
– Mean
– Standard error of measurement
– “Informative” or “uninformative” prior
distribution
27
Data Estimation
• Generate a prior “best” estimate of all entries:
Values and/or coefficients.
• A “prototype” based on:
– Values and aggregates
• Historical and current data
• Expert Knowledge
– Coefficients: technology and behavior
• Current and/or historical data
• Assumption of behavior and technical stability
28
Estimation Constraints
• Nationally
– Area times Yield = Production by crop
– Total area = Sum of area over crops
– Total Demand = Sum of demand over types of
demand
– Net trade = Supply – Demand
• Globally
– Net trade sums to 0
29
Measurement Error
• Error specification
– Error on coefficients or values
– Additive or multiplicative errors
• Multiplicative errors
– Logarithmic distribution
– Errors cannot be negative
• Additive
– Possibility of entries changing sign
30
Error Specification
Typical error specification (additive):
x i = x i  ei
ei   Wi ,k vi ,k
k
where 0  Wi ,k  1
and
W
i ,k
1
k
and vi ,k is the "support set" for the errors
31
Error Specification
• Errors are weighted averages of support set
values
– The v parameters are fixed and have units of item
being estimated.
– The W variables are probabilities that need to be
estimated.
• Convert problem of estimating errors to one
of estimating probabilities.
32
Error Specification
• The technique provides a bridge between
standard estimation where parameters to be
estimated are in “natural” units and the
information approach where the parameters
are probabilities.
– The specified support set provides the link.
33
Error Specification
• Conversion of a “standard” stochastic
specification with continuous random
variables into a specification with a discrete
set of probabilities
– Golan, Judge, Miller
• Problem is to estimate a discrete probability
distribution
34
Uninformative Prior
• Prior incorporates only information about the
bounds between which the errors must fall.
• Uniform distribution is the continuous
uninformative prior in Bayesian analysis.
– Laplace: Principle of insufficient reason
• We specify a finite probability distribution that
approximates the uniform distribution.
35
Uninformative Prior
• Assume that the bounds are set at ±3s where
s is a constant.
• For uniform distribution, the variance is:

2
3s   3s  


12
2
 3s
2
36
7-Element Support Set
v1  3s v2  2s v3  s v4  0
v5   s v6  2s v7  3s
   wk  v
2
2
k
k
1
and the prior is wk 
7
2
s
2
   9  4  1  1  4  9   4s
7
2
37
Uninformative Prior
• Finite uniform prior with 7-element support set is
a conservative uninformative prior.
• Adding more elements would more closely
approximate the continuous uniform distribution,
reducing the prior variance toward the limit of
3s2.
• Posterior distribution is essentially unconstrained.
38
Informative Prior
• Start with a prior on both mean and standard
deviation of the error distribution
– Prior mean is normally zero.
– Standard deviation of e is the prior on the
standard error of measurement of item.
• Define the support set with s=σ so that the
bounds are now ±3σ.
39
Informative Prior, 2 Parameters
W
 vi ,k  0
W
v
i ,k
Mean
k
i ,k
2
i ,k

2
i
Variance
k
40
3-Element Support Set
vi ,1  3 i
vi ,3  0
vi ,5  3 i
41
Informative Prior, 2 Parameters
  Wi ,1   9
2
i
2
i
  W   0   W   9 
Wi ,1  Wi ,3 
i ,2
i ,3
2
i
1
18
Wi ,2  1  Wi ,1  Wi ,3 
16
18
42
Informative Prior: 4 Parameters
• Must specify prior for additional statistics
– Skewness and Kurtosis
• Assume symmetric distribution:
– Skewness is zero.
• Specify normal prior:
– Kurtosis is a function of σ.
• Can recover additional information on error
distribution.
43
Informative Prior, 4 Parameters
W
i ,k  vi ,k  0
Mean
k
W
i ,k
v
2
i ,k

Variance
2
i
k
W
i ,k
v
3
i ,k
0
Skewness
k
W
i ,k
 v  3
4
i ,k
4
i
Kurtosis
k
44
5-Element Support Set
vi ,1  3.0  i
vi ,2  1.5  i
vi ,3  0
vi ,4  1.5  i
vi ,5  3.0  i
45
Informative Prior, 4 Parameters
 i2  Wi ,1   9  i2   Wi ,2   2.25  i2   Wi ,3   0  



Wi ,2  2.25  i2  Wi ,1  9  t2

 81 4 
3  Wi ,1  81   Wi ,2    i  
 16 
 81 4 
Wi ,3   0   Wi ,2    i   Wi ,1  81  t4
 16 
4
i

4
i



1
16
48
Wi ,1  Wi ,5 
; Wi ,2  Wi ,4 
; Wi ,3 
162
81
81
46
Implementation
• Implement program in GAMS
– Large, difficult, estimation problem
– Major advances in solvers. Solution is now robust
and routine.
• CE minimand similar to maximum likelihood estimators.
• Excel front end for GAMS program
– Easy to use
47
Implementation
Data Collection
Commodity Balance
Food Balance
Data Cleaning and Setting Priors
Crop Production
Livestock Production
Commodity Demand and
Trade
Processed Commodities
(oilseeds, sugar, etc.)
Data Estimation with Cross Entropy
Nationally: Trade = Supply - Demand
Nationally: Area X Yield = Supply
Globally: Supply = Demand
IMPACT 3 FAOSTAT Database
48