Aggregate Governance Indicators Aart Kraay The World Bank Presentation at World Bank Conference “The Empirics of Governance” May 1-2 2008

Download Report

Transcript Aggregate Governance Indicators Aart Kraay The World Bank Presentation at World Bank Conference “The Empirics of Governance” May 1-2 2008

Aggregate Governance Indicators
Aart Kraay
The World Bank
Presentation at World Bank Conference
“The Empirics of Governance”
May 1-2 2008
Why Aggregate Indicators?
• synthesize information about governance from a diversity
of viewpoints
– particularly useful for advocacy (look at success of TICPI)
• achieve greater country coverage than individual
indicators
– enable comparisons across different sources
• smooth out idiosyncracies of many individual sources
– better measure of broad concepts of governance
– but at the cost of specificity
• generate explicit measures of the imprecision of
aggregate – and individual – indicators
they can (almost) always be disaggregated!
Plan for Talk
• How to construct and use aggregate indicators
– define what you want to measure
– select sources and combine them
– compute (and use!) margins of error
– discuss only “multisource aggregates”
• Details, details, details.....
– balanced or unbalanced comparisons?
– independence of errors?
– relative or absolute changes?
– complexity?
– scaring the jello?
• Summing up
Defining Topics for Aggregate Indicators
• governance is hard to define sharply
– but ... easy to overstate lack of definitional consensus
around governance (see next slide)
• some areas of governance have particularly clear and
unambiguous definitions:
– “Corruption is the use of public office for private gain”
• lack of definitional agreement (at the margin?) should not
paralyze measurement efforts
– proponents of alternative definitions can feel free to
construct their own indicators
• are the resulting country rankings different?
• what do we learn from the differences?
What do we mean by “Governance” ?
• World Bank (1992): "Governance is the manner in which
power is exercised in the management of a country's
economic and social resources for development“
• World Bank (2007) definition: "...the manner in which public
officials and institutions acquire and exercise the authority to
shape public policy and provide public goods and services"
• WGI Definition (1999): "...the traditions and institutions by
which authority in a country is exercised. This includes the
process by which governments are selected, monitored and
replaced; the capacity of the government to effectively
formulate and implement sound policies; and the respect of
citizens and the state for the institutions that govern economic
and social interactions among them."
Six Dimensions of Governance in the WGI
• The process by which those in authority are selected and
replaced
– VOICE AND ACCOUNTABILITY
– POLITICAL STABILITY & ABSENCE OF
VIOLENCE/TERRORISM
• The capacity of government to formulate and implement
policies
– GOVERNMENT EFFECTIVENESS
– REGULATORY QUALITY
• The respect of citizens and state for institutions that
govern interactions among them
– RULE OF LAW
– CONTROL OF CORRUPTION
Selecting Individual Indicators For Use As
Ingredients of Aggregate Indicators
• view individual indicators as imperfect or noisy proxies for
broader concepts of governance, e.g.:
– control of corruption: proxies include:
• is corruption widespread?
• percent of contract value demanded in bribes?
• risk that value of FDI adversely affected by bribes?
• Crucial observation: proxies do not need to be
perfect to be useful!
Indicator = Signal + Noise
– ideally want indicators with low Noise/Signal... but
Noise/Signal=0 is unattainable
– as long as Noise/Signal>0, indicator is useful
ingredient for aggregate indicator
– aggregation can be used to downweight indicators with
high Noise/Signal (to come....)
More Examples of Ingredients for Aggregate
Governance Indicators
• Rule of Law (WGI)
– enforceability of private contracts (DRI)
– fairness/speed of judicial process (EIU)
– confidence in police (GWP)
– property rights over rural land (IFD)
– many more....
• Sustainable Economic Opportunity (Ibrahim Index of
African Governance)
– GDP/Capita, Growth, Inflation, Budget Deficit
– Days to start a business
– Contract-intensive money
– Road density, computer and internet density
Placing Indicators in Common Units
• Trivial to rescale data to 0-1 scale
• More subtle issue: how do we compare a 7/10 score in a
source that covers mostly developed countries with a 7/10
score in a source that covers mostly developing
countries?
– Option 1: percentile matching (TI-CPI)
• Source 1: A > B > C
• Source 2: C > D
• Aggregate: A > B > C > D
– Option 2: elaboration on unobserved components
model (WGI). Details in KKZ (1999).
• Useful byproduct of aggregate indicators is that it allows
comparisons based on dissimilar sources (in example
above you can now compare country A and country D)
Weight A Minute – All Aggregate Indicators
Require Decisions on Weights!
• Option 1: Arbitrarily assign weights
– equal weights (e.g. TI-CPI, most others)
– different weights based on views of what matters more
(e.g. Ibrahim Index of African Governance)
– decision to exclude a source implies setting a zero weight
(e.g. TI-CPI excludes all household surveys)
• Option 2: Let the data choose the weights (logic of
unobserved components model underlying WGI)
y1=g+e1
y2=g+e2
y3=g+e3
– if CORR(y1,y2)>>COR(y1,y3), y1 and y2 are more
informative about g (if errors are independent)
• Option 3: Regression-based weights to capture importance of
each indicator for outcomes
– not widely (ever?) used
– in principle is appealing, in practice virtually impossible
Does Weighting Matter?
• Depends crucially on the extent to which the underlying
data sources are correlated with each other
– if correlations are high, weighting matters little
– if correlations are low, weighting matters a lot
• Example: Two robustness checks on WGI weighting
scheme for Control of Corruption
– Option 1: equally-weighted
– Option 2: aggregate 4 types of sources (commercial,
NGO, public sector, and surveys)
– Very highly correlated with baseline WGI indicator
• Option1: correlation with baseline = 0.998
• Option 2: correlation with baseline = 0.959
– conclude that “ingredients” of WGI-CC are quite highly
correlated – so details of weighting don’t matter much
Spot the Difference: Alternative Aggregations of WGIControl of Corruption Indicators Using Different Weights
Equal Weights
Equal Weights By Type
-2.5
-1.5
2.5
2
1.5
1
0.5
0
-0.5
-0.5
-1
-1.5
-2
-2.5
Alternate
WGI-CC
0.5
1.5
2.5
Baseline WGI-CC
Margins of Error
• margins of error summarize the degree of disagreement
across sources in their assessment of governance
• two ways to construct them:
– standard deviation across sources
– estimate based on a structural statistical model (e.g.
WGI uses unobserved components model)
• precision-weighting of sources in WGI (modestly)
reduces margins of error
• aggregation reduces measurement error about broad
concepts (smooths out idiosyncracies of individual
sources)
• essential to use them to assess significance of crosscountry differences or changes over time
Margins of Error Decline With the Number (and
Quality) of Data Sources
Control of Corruption
Standard Error
of Governance Estimate
0.8
0.7
0.6
2006
0.5
1996
0.4
0.3
0.2
0.1
0
0
5
10
15
Number of Data Sources
20
Good
Governance
Control of Corruption
2.5
Selected Countries, 2006
Margins
of Error
Governance
Level
FINLAND
ICELAND
DENMARK
NEW ZEALAND
SINGAPORE
JAPAN
CHILE
UNITED STATES
SLOVENIA
ESTONIA
BOTSWANA
URUGUAY
SOUTH AFRICA
HUNGARY
GREECE
COSTA RICA
SLOVAKIA
ITALY
BRAZIL
MEXICO
GEORGIA
CHINA
CAMEROON
KENYA
PARAGUAY
SUDAN
HAITI
CAMBODIA
Poor
Governance
EQ. GUINEA
MYANMAR
-2.5
SOMALIA
0
DISCLAIMER: The data and research reported here do not reflect the official views of the World Bank, its Executive Directors, or the countries they represent.
The WGI are not used by the World Bank Group to allocate resources or for any other official purpose.
Source for data: 'Governance Matters VI: Governance Indicators for 1996-2006’, by D. Kaufmann, A. Kraay and M. Mastruzzi, June 2007,
www.govindicators.org. Colors are assigned according to the following criteria: Dark Red: country is in the bottom 10th percentile rank (‘governance crisis’);
Light Red: between 10th and 25th percentile rank; Orange: between 25th and 50th percentile rank; Yellow, between 50th and 75th; Light Green between 75th and
90th percentile rank; and Dark Green: between 90th and 100th percentile (exemplary governance). Estimates subject to margins of error.
Changes in Control of Corruption, 1996-2004
2006
3
2
ARE
QAEST
T
ISR
CY P
1
LV A
B GR
TZA
-3
-2
TTO
0
-1
0
PN G
B GD
ZW E
-1
-2
-3
1
C IV
2
1996 3
Margins of Error: A Little Perspective
• do not confuse absence of explicit margins of error with
absence of measurement error – present in all
governance indicators
• margins of error are not unique to subjective- or
perceptions-based aggregate indicators
– can infer them based on inter-correlations of any type
of indicator
• keep the baby, ditch the bathwater!
– 2/3 of pairwise comparisons on WGI are significant (at
90% level)
– 1/3 of countries show a significant (at the 90% level)
change in at least one of the six WGI between 1996
and 2006
Details 1: Don’t Lose Your Balance!
• comparisons of aggregate indicators across countries and
over time are often “unbalanced” – different set of sources
underlying the two comparators
• the alternative (strictly balanced) is far too restrictive
– balanced WGI-CC based on top five sources would
cover just 117 countries, not 207
– much less diverse set of sources as well
• “unbalancedness” is not so bad as you think!
– 60% of pairwise comparisons in WGI involve 5 or more
common sources
– just 7% of variation in large changes due to changes in
composition of sources
– can always go back to the source data!
Details 2: I Think You Think I Think You Think I
Think You Think Bangladesh is Corrupt
• Correlated perception errors are potentially an important
issue, as they could:
– reduce the information content of aggregate indicators
– distort weighting scheme
• First-order issue: single- versus multiple-source
aggregate indicators
– single-source aggregates average responses of the
same experts to many questions (CPIA, GII, DB, etc)
• almost by definition have strongly correlated
perception errors across components
– multiple-source aggregates are less subject to this
problem
• unless perception errors perfectly correlated, still
can get efficiency gains from aggregation
Evidence on Correlated Perception Errors?
• easy to assert, but hard to test
y1=g+e1
y2=g+e2
– all we observe is CORR(y1, y2) – is it because:
• CORR(e1, e2) is high?
• VAR(e1) and VAR(e1) are small?
– need an identification strategy (not storytelling)
• Example: expert assessments more likely to make
correlated perceptions errors than survey respondents
– are expert assessments more correlated with each other
than with surveys? Not necessarily
• average pairwise correlation of 5 expert assessments
of corruption = 0.80
• average correlation of each with a firm survey = 0.82
– correlations among expert assessments don’t increase
over time
Belarus
Belarus: Assessments of Corruption
1
BEEPS
0.9
0.8
Global Insight/DRI
0.7
0.6
Global Insight/WMO
0.5
Political Risk
Services
0.4
0.3
Merchant
International Group
0.2
0.1
Freedom House
0
2002
2003
2004
2005
2006
Details 3: Everything is Relative .... Or Is It?
• All indicators require choice of units
– 0-10 (TI-CPI), 1-6 (CPI), A-D (PEFA)
– WGI has particularly nerdy choice (are you surprised?):
• standard normal distribution
• forces world average = 0 in each period
– most other indicators also implicitly make choices
about averages (e.g. CPIA grade inflation)
• Does this confuse relative and absolute changes?
Δy(j) = Δ(y(j)-average) + Δaverage
– absolute changes and relative changes coincide if no
changes in the world average
– look for evidence in individual sources whether world
averages change – answer is a resounding “no”!
Details 4: It’s Just Too Haaaaard.....!
 g( j, t )
y( j, t )   1    '   '  1  '   ' 
  

V
    w w 
j, t 1Score
) 
 1  1    y  y ' 
   ' 

g(2j,t1 1) y(Score
2 


V g j | y j   1   k 
 kK

j




t
t 1
 g( j, t )
E
g( j, t  1)
12  21 '  'R(t )
1/ 2
kt
kt 1
  '
y kt  kt 

w


kt
y( j,ktK A 1) kt  '
y( j, t )
kK C
kt
kt 1
kt
kt 1
  '  1 1  y( j, t )  ( t ) 
  B 

' 
 y( j, t  1)  ( t  1) 
 y   kt 1 
/2

1
w kt 1   kt 1

kt 1
kK D


(t  1)

 w kt
  
kK C

kt
  kt 

w
w kt 1
w   y  y kt 1 
  kt 1     kt  kt 1   kt

kt 1
kt 1  
2

 kK C kt
• common critique is that aggregate indicators are too
complicated and non-transparent
• same is true for all kinds of things (national income
accounts, PPP adjustments, poverty measures, the NFL
draft, the engine under the hood of my car)
• better to be complicated (and a bit closer to right) than
naive (and a bit further from right)
Details 5: Scaring the Plants (and the Jello)
• how big are the risks of measurement ahead of theory?
– risk of inaction until we all agree on a theory is worse
• how do we verify indicators (and ingredients of indicators)?
– are different indicators of a core dimension of
governance correlated?
• exactly what unobserved components model does
– are they uncorrelated with other core dimensions of
governance?
• pretty much a hopeless question since core
dimensions of governance are correlated
• more or less “free entry” in the market for indicators
– more interesting to show the quantitative relevance of
critiques than to simply speculate
Summing Up
• aggregate indicators can (for some purposes) serve as a
useful summary of large numbers of indicators
– but no reason to be wedded to any particular
aggregate
• we learn a lot from cross-referencing alternative related
indicators as part of process of building aggregates
– why are they similar, why are there outliers?
– formally can construct margins of error
• crucial for policy dialogue (and sensible use)
• lots of potential for argument over the “nitty-gritty details”
– less clear that these are first-order concerns
Bottom Line
• differences between:
– alternative aggregates,
– aggregate versus individual,
– subjective vs objective
– ‘actionable’ versus ‘whatever the antonym is’
are minor compared to difference between having data and
not having it at all