Lecture Notes - Andre R. Neveu

Download Report

Transcript Lecture Notes - Andre R. Neveu

EC339: Applied
Econometrics
Introduction
1
What is Econometrics?

Scope of application is large



Literal definition: measurement in economics
Working definition: application of statistical
methods to problems that are of concern to
economists
Econometrics has wide applications—beyond
the scope of economics
2
What is Econometrics?

Econometrics is primarily interested in



Quantifying economic relationships
Testing competing hypothesis
Forecasting
3
Quantifying Economic
Relationships



Outcomes of many policies tied to the magnitude of the slope
of supply and demand curves
Often need to know elasticities before we can begin practical
analysis
For example, if the minimum wage is raised, unemployment
may drop as more workers enter the labor force

However, this depends on the slopes of the labor supply and labor
demand curves


Econometric analysis attempts to determine this answer
Allows us to quantify causal relationships when the luxury of
a formal experiment is not available
4
Testing Competing Hypothesis


Econometrics helps fill the gap between the
theoretical world and the real world
For instance, will a tax cut impact consumer
spending?


Keynesian models relate consumer spending to annual
disposable income, suggesting that a cut in taxes will
change consumer spending
Other theories relate consumer spending to lifetime
income, suggesting a tax cut (especially a “one-shot
deal”) will have little impact on consumer spending
5
Forecasting

Econometrics attempts to provide the
information needed to forecast future values

Such as inflation, unemployment, stock market
levels, etc.
6
The Use of Models

Economists use models to describe real-world
processes

Models are simplified depictions of reality


Usually an equation or set of equations
Economic theories are usually deterministic while
the world is characterized by randomness


Empirical models include a random component known as
the error term, or i
Typically assume that the mean of the error term is zero
7
Types of Data

Data provide the raw material needed to




Data can be described as a set of observations such
as income, age, grade


Quantify economic relationships
Test competing theories
Construct forecasts
Each occurrence is called an observation
Data are in different formats



Cross-sectional
Time series
Panel data
8
Cross-Sectional Data

Provide information on a variety of entities at
the same point in time
9
Time Series Data

Provides information for the same entity at
different points in time
10
Panel (or Longitudinal) Data

Represents a combination of cross-sectional
and time series data

Provides information on a variety of entities at
different periods in time
11
Conducting an Empirical Project


How to Write an Empirical Paper
Select a topic


Textbooks, JSTOR, News sources (for ideas),
“pop-econ”
Learn what others have learned about this
topic


Spend time researching what others have done
Conduct extensive literature review
12
Conducting an Empirical Project


Theoretical Foundation
Have an empirical strategy





Existing literature may help
Would apply the methods you learn in this book
Gather data and apply appropriate econometric techniques
Interpret your results
Write it up…

Build like a court case or newspaper article
13
Where to obtain data

How to use DataFerrett
CPS.doc

Files for course will be stored on datastor
\\datastor\courses\economic\ec339

You can download all files from book
http://caleb.wabash.edu/econometrics/index.htm
14
Web Links
Resources for Economists on the Internet are
available at
www.rfe.org
www.freelunch.com
www.bea.gov, www.census.gov, www.bls.gov
15
Math Review
There is much more to it… but these are the
basics you must know
16
y  f ( x)  a  bx
Math Review
Differentiation expresses the rate at which a
quantity, y, changes with respect to the
change in another quantity, x, on which it
has a functional relationship. Using the
symbol Δ to refer to change in a quantity.
y
 slope  b
x
f ( x)  3  2 x
y ( y1  y0 ) (7  3) 4


 2b
x ( x1  x0 ) (2  0) 2
f ( x)
y
Linear Relationship (i.e., a straight line) has a
specific equation. As x changes, how does
y change?
Directly related (x increases, y increases)
Inversely related (x increases, y decreases)
x
x=0, y=3 or (0,3).
17
x=2, y=3+2(2) or (2,7)
y  f ( x)  a  bx
Math Review
y
 slope  b
x
Derivatives are essentially the same thing.
f ( x)  3  2 x
Instead of looking at the difference in y as
y ( y1  y0 ) (3.0002  3) .0002
x goes from 0 to 2, if you look at very



2b
small intervals, say changing x from 0 to x ( x1  x0 ) (.0001  0) .0001
f ( x)
0.0001, the slope does not change for a
y
straight line
The basic rule for derivatives is that the
distance between the initial x and new x
approches zero (in what is called the
limit)
x=0, y=3 or (0,3).
x=.0001, y=3+2(.0001) or (x,y)=(.0001,3.0002)
x
18
y  f ( x)  a  bxc
Math Review
Derivatives have a slightly different notation
than delta-y/delta-x, namely dy/dx or
f’(x). Constants, such as the y-intercept
do not change as x changes, and thus are
dropped when taking derivatives.
dy  f '( x)  c(b) xc1
f ( x)  3  2 x
f '( x)  (1)2x11  2( x0 )  2
f ( x)
y
Derivatives represent the general formula to
find the slope of a function when
evaluated at a particular point. For
straight lines, this value is fixed.
x=0, y=3 or (0,3).
x=.0001, y=3+2(.0001) or (x,y)=(.0001,3.0002)
x
19


F ( x)   ydx   f ( x)dx   a  bx c dx
b c 1
x
c 1
2 2
F ( x)   (3  2 x) dx  3 x 
x C
11
Math Review
F ( x)   (a  bx c )dx  ax 
Integration (or reverse differentiation) is just
the opposite of a derivative, you have to
F ( x)  3x  x2  C
remember to add back in C (for constant)
10
since you may not know the “primitive”
F ( x)   (3  2 x)dx  [3x  x 2  C ]10
0
equation.
0
F ( x)  [3(10)  (10)2 ]  [3(0)  (0)2 ]  130
There are indefinite integrals (over no
y
23
specified region) and definite integrals
(where the region of integration is
specified).
3
Also, the result of integration should be the
function you would HAVE TO TAKE the
derivative of to get the initial function.
x
10
Area=[3*(10-0)]+[1/2*(10-0)*(3+2(10))]=130
20
Basic Definitions

Random variable

A function or rule that assigns a real number to
each basic outcome in the sample space




The domain of random variable X is the sample space
The range of X is the real number line
Value changes from trial to trial
Uncertainty prevails in advance of the trail as to
the outcome
21
Case Study
Weight Data
Introductory Statistics class
Spring, 1997
Virginia Commonwealth University
22
Weight Data
192
152
135
110
128
180
260
170
165
150
110
120
185
165
212
119
165
210
186
100
195
170
120
185
175
203
185
123
139
106
180
130
155
220
140
157
150
172
175
133
170
130
101
180
187
148
106
180
127
124
215
125
194
23
Weight Data: Frequency Table
Weight Group
100 - <120
120 - <140
140 - <160
160 - <180
180 - <200
200 - <220
220 - <240
240 - <260
260 - <280
Count
7
12
7
8
12
4
1
0
1
sqrt(53) = 7.2, or 8 intervals; range (260100=160) / 8 = 20 = class width
24
Weight Data: Histogram
14
Number of students
12
10
8
6
Frequency
4
2
0
100
120
140
160
180
200
Weight
220 240
260
280
* Left endpoint is included in the group, right endpoint is not.
25
Numerical Summaries

Center of the data



mean
median
Variation




range
quartiles (interquartile range)
variance
standard deviation
26
Mean or Average


Traditional measure of center
Sum the values and divide by the number
of values
n
1
1
x  x1  x2  x3  xn    xi
n
n i 1
27
Median (M)





A resistant measure of the data’s center
At least half of the ordered values are less
than or equal to the median value
At least half of the ordered values are
greater than or equal to the median value
If n is odd, the median is the middle ordered value
If n is even, the median is the average of the two
middle ordered values
28
Median (M)
Location of the median: L(M) = (n+1)/2 ,
where n = sample size.
Example: If 25 data values are recorded, the
Median would be the
(25+1)/2 = 13th ordered value.
29
Median

Example 1 data: 2 4 6
Median (M) = 4

Example 2 data: 2 4 6 8
Median = 5 (ave. of 4 and 6)

Example 3 data: 6 2 4
Median  2
(order the values: 2 4 6 , so Median = 4)
30
Comparing the Mean & Median


The mean and median of data from a
symmetric distribution should be close
together. The actual (true) mean and
median of a symmetric distribution are
exactly the same.
In a skewed distribution, the mean is
farther out in the long tail than is the
median [the mean is ‘pulled’ in the
direction of the possible outlier(s)].
31
Quartiles




Three numbers which divide the ordered data
into four equal sized groups.
Q1 has 25% of the data below it.
Q2 has 50% of the data below it. (Median)
Q3 has 75% of the data below it.
32
L(M)=(53+1)/2=27
L(Q1)=(26+1)/2=13.5
100
101
106
106
110
110
119
120
120
123
124
125
127
128
130
130
133
135
139
140
Weight Data: Sorted
148
150
150
152
155
157
165
165
165
170
170
170
172
175
175
180
180
180
180
185
185
185
186
187
192
194
195
203
210
212
215
220
260
33
Variance and Standard Deviation

Recall that variability exists when some
values are different from (above or below)
the mean.

Each data value has an associated deviation
from the mean:
xi  x
34
Deviations



what is a typical deviation from the
mean? (standard deviation)
small values of this typical deviation
indicate small variability in the data
large values of this typical deviation
indicate large variability in the data
35
Variance





Find the mean
Find the deviation of each value from the
mean
Square the deviations
Sum the squared deviations
Divide the sum by n-1
(gives typical squared deviation from mean)
36
Variance Formula
n
Remember that you must
find the deviations of
EACH x, square the
deviations, THEN add
them up!
1
2
s 
( xi  x )

(n  1) i 1
2
n
1
x   xi
n i 1
37
Standard Deviation Formula
typical deviation from the mean
n
1
2
s
(
x

x
)

i
(n  1) i 1
[ standard deviation = square root of the variance ]
38
Variance and Standard Deviation
Example from Text
Metabolic rates of 7 men (cal./24hr.) :
1792 1666 1362 1614 1460 1867 1439
x
1792  1666  1362  1614  1460  1867  1439

7
11,200

7
 1600
39
Variance and Standard Deviation
Example
Observations
xi
Deviations
Squared deviations
2
xi  x 
xi  x
(192)2 = 36,864
1792
17921600 = 192
1666
1666 1600 = 66
1362
1362 1600 = -238
1614
1614 1600 = 14
1460
1460 1600 = -140
(-140)2 = 19,600
1867
1867 1600 = 267
(267)2 = 71,289
1439
1439 1600 = -161
(-161)2 = 25,921
sum =
0
(66)2 =
4,356
(-238)2 = 56,644
(14)2 =
196
sum = 214,870
Notice the deviations add to zero, so
each deviation must be squared
40
Variance versus Standard Deviation
1
1
s 
(214,870)  (214,870)  35,811.67
Value
Observation
7 1
6
2
s  s 2  35,811.67  189.24
Note: Standard deviation is in the same units as
the original data (cal/24 hours) while variance is
in those units squared (cal/24 hours)2. Thus
variance is not easily comparable to the
original data.
1
2
3
4
5
6
7
=sum(B1:B7)
=stdevp(B1:B7)
=stdev(B1:B7)
=variance(B1:B7)
1,792
1,666
1,362
1,614
1,460
1,867
1,439
11,200
175
189
35,812
41
Density Curves
Example: here is a histogram
of vocabulary scores of 947
seventh graders.
The smooth curve drawn
over the histogram is a
mathematical model for the
distribution. This is typically
written as f(x), also known
as the PROBABILITY
DISTRIBUTION
FUNCTION (PDF)
42
Density Curves
Example: the areas of the
shaded bars in this histogram
represent the proportion of
scores in the observed data
that are less than or equal to
6.0. This proportion is equal
to 0.303. The area
underneath the curve, is
called the CUMULATIVE
DENSITY FUNCTION
(CDF): denoted F(x)
43
Density Curves
Example: now the area under
the smooth curve to the left of
6.0 is shaded. If the scale is
adjusted so the total area
under the curve is exactly 1,
then this curve is called a
density curve. The proportion
of the area to the left of 6.0 is
now equal to 0.293.
.55
F ( x) 


1 xx
 (
1
e 2
2 x
x
)2
 .293
44
45
46
Density Curves

Always on or above the horizontal axis

Have area exactly 1 underneath curve

Area under the curve and above any range of
values is the proportion of all observations
that fall in that range
47
Density Curves

The median of a density curve is the equalareas point, the point that divides the area
under the curve in half

The mean of a density curve is the balance
point, at which the curve would balance if
made of solid material
48
Density Curves

The mean and standard deviation computed
from actual observations (data) are denoted
by and s, respectively.x

The mean and standard deviation of the
actual distribution represented by the
density curve are denoted by µ (“mu”) and
 (“sigma”), respectively.
49
Question
Data sets consisting of physical measurements
(heights, weights, lengths of bones, and so on) for
adults of the same species and sex tend to follow
a similar pattern. The pattern is that most
individuals are clumped around the average, with
numbers decreasing the farther values are from
the average in either direction. Describe what
shape a histogram (or density curve) of such
measurements would have.
50
Bell-Shaped Curve:
The Normal Distribution
standard deviation
mean
51
52
The Normal Distribution
Knowing the mean (µ) and standard deviation ()
allows us to make various conclusions about Normal
distributions. Notation: N(µ,).
53
54
55
56
68-95-99.7 Rule for
Any Normal Curve



68% of the observations fall within (meaning above
and below) one standard deviation of the mean
95% of the observations fall within two standard
deviations (actually 1.96) of the mean
99.7% of the observations fall within three standard
deviations of the mean
57
68-95-99.7 Rule for Approximates
for any Normal Curve
68%
-
95%
µ +
-2
µ
+2
99.7%
-3
µ
+3
58
68-95-99.7 Rule for
Any Normal Curve
59
60