Midterm Review - UCSB Economics

Download Report

Transcript Midterm Review - UCSB Economics

Midterm Review
1
Econ 240A
 Descriptive
Statistics
 Probability
 Inference
 Differences
between populations
 Regression
2
I. Descriptive Statistics
 Telling
stories with Tables and Graphs
 That
are self-explanatory and esthetically
appealing
 Exploratory
Data Analysis for random
variables that are not normally
distributed
 Stem
and Leaf diagrams
 Box and Whisker Plots
3
Stem and Leaf Diagtam
 Example:
Problem 2.24
 Prices in thousands of $ of houses sold
in a Los Angeles suburb in a given year
4
Subsample
Prices
289
208
255
215
270
222
206
221
210
224
209
250
222
213
220
250
209
Problem 2.24
Prices in thousands $
Houses sold in a Los
Angeles suburb
5
Sorted Data
Prices
192
195
198
200
202
205
206
206
208
208
209
209
209
209
209
210
211
Problem 2.24
Prices in thousands $
Houses sold in a Los
Angeles suburb
6
Prices
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
237.9882
3.314365
230
222
30.55693
933.7261
1.620493
1.164885
149
192
341
20229
85
Summary Statistics
Problem 2.24
Prices in thousands $
Houses sold in a Los
Angeles suburb
7
Stem & Leaf Display
Stems
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Leaves
->258
->025668899999
->01233457789
->0012222223346699
->00336
->012244467788
->00002255
->0569
->00235689
->69
Problem 2.24
Prices in thousands $
Houses sold in a Los
Angeles suburb
->68
->01
8
Box and Whiskers Plots
 Example:
Problem 4.30
 Starting salaries by degree
9
BA
Subsample
Problem 4.50
Starting salaries
By degree
26819
25797
29115
32877
30015
25090
23163
28225
25103
29742
24587
20780
30353
BSc
BBA
Other
28930
38968
34550
36602
35187
30245
35098
29452
31520
36793
30943
26680
36171
31610
29047
28396
39738
35037
26204
37444
26550
37280
38403
35704
37660
36459
32262
24539
37963
34206
27222
34138
26917
39536
42062
26723
32653
32700
36297
10
BA
Smallest = 18719
Q1 = 25730
Median = 27765
Q3 = 29835.5
Largest = 37025
IQR = 4105.5
Outliers: 37025, 36345, 18719,
0
BoxPlot
10000
20000
BSc
Smallest = 23451
Q1 = 29927
Median = 33396.5
Q3 = 36745.25
Largest = 40105
IQR = 6818.25
Outliers:
30000
40000
50000
40000
50000
40000
50000
40000
50000
BoxPlot
0
10000
20000
BBA
Smallest = 23401
Q1 = 31316
Median = 34284
Q3 = 39551
Largest = 47639
IQR = 8235
Outliers:
30000
BoxPlot
0
10000
20000
Other
Smallest = 21994
Q1 = 28253.5
Median = 29950.5
Q3 = 32905.25
Largest = 38812
IQR = 4651.75
Outliers:
30000
BoxPlot
0
10000
20000
30000
11
BA
Smallest = 18719
Q1 = 25730
Median = 27765
Q3 = 29835.5
Largest = 37025
IQR = 4105.5
Outliers: 37025, 36345, 18719,
0
BoxPlot
10000
20000
BSc
Smallest = 23451
Q1 = 29927
Median = 33396.5
Q3 = 36745.25
Largest = 40105
IQR = 6818.25
Outliers:
30000
40000
50000
BoxPlot
0
10000
20000
30000
40000
50000
0
10000
20000
30000
40000
50000
12
BBA
Smallest = 23401
Q1 = 31316
Median = 34284
Q3 = 39551
Largest = 47639
IQR = 8235
Outliers:
BoxPlot
0
10000
20000
Other
Smallest = 21994
Q1 = 28253.5
Median = 29950.5
Q3 = 32905.25
Largest = 38812
IQR = 4651.75
Outliers:
30000
40000
50000
BoxPlot
0
10000
20000
30000
0
10000
20000
30000
40000
40000
50000
50000
13
II. Probability
 Concepts
 Elementary
outcomes
 Bernoulli trials
 Random experiments
 events
14
Probability (Cont.)
 Rules
or axioms:
 Addition rule
 P(AUB)
= P(A) + P(B) – P(A^B)
 Conditional
 P(A/B)
probability
= P(A^B)/P(B)
 Independence
15
Probability ( Cont.)
 Conditional
 P(A/B)
probability
= P(A^B)/P(B)
 Independence
 P(A)*P(B)
= P(A^B)
 So P(A/B) = P(A)
16
Probability (Cont.)
 Discrete
Binomial Distribution
= Cn(k) pk (1-p)n-k
 n repeated independent Bernoulli trials
 k successes and n-k failures
 P(k)
17
Binomial Random Number
Generator
 Take
50 states
 Suppose each state was a battleground
state, with probability 0.5 of winning that
state
 What would the distribution of states
look like?
 How
few could you win?
 How many could you win?
18
Subsample
24
24
28
25
18
29
25
24
24
23
25
24
29
19
Histogram of States Won
8
6
4
2
36
34
32
30
28
26
24
22
20
0
18
Frequency
10
Bin
20
Discrete Probability Density, p=0.5
0.12
Probability
0.1
0.08
0.06
0.04
0.02
0
15
20
25
30
35
40
States Won
21
Discrete Cumulative Distribution, p=0.5
1.2
Probabilty
1
0.8
0.6
0.4
0.2
0
0
5
10
15
20
25
30
35
40
States Won
22
Discrete Cumulative Distribution
1.2
Probability
1
p=0.5
p=0.48
0.8
0.6
0.4
0.2
0
0
5
10
15
20
25
States Won
30
35
40
23
Probability (Cont.)
 Continuous
normal distribution as an
approximation to the binomial
 n*p>5,
n(1-p)>5
 f(z) = (1/2p)½ exp[-½*z2]
 z=(x-m)/s
 f(x) = (1/ s) (1/2p)½ exp[-½*{(x-m)/s}2]
24
III. Inference
 Rates
and Proportions
 Population Means and Sample Means
 Population Variances and Sample
Variances
 Decision Theory
25
Decision Theory
 In
inference, I.e. hypothesis testing, and
confidence interval estimation, we can
make mistakes because we are making
guesses about unknown parameters
 The objective is to minimize the
expected cost of making errors
 E(C) = a*C(I) + b*C(II)
26
Sample Proportions from Polls
pˆ  k / n

Where n is sample size and k is number of
successes
k ~ B(np, np(1  p)
27
Sample Proportions
Epˆ  (1 / n) Ek  (1 / n)np  p
VARpˆ  (1 / n 2 )Vark  (1 / n 2 )np(1  p)  p(1  p) / n
pˆ ~ N ( p, (1 / n) p(1  p)
So estimated p-hat is approximately normal for large sample sizes
28
Sample Proportions
 Where
the sample size is large
29
Problem 9.38
 A commercial
for a household
appliances manufacturer claims that
less than 5% of all of its products
require a service call in the first year. A
consumer protection association wants
to check the claim by surveying 400
households that recently purchased one
of the company’s appliances
30
Problem 9.38 (Cont.)
 What
is the probability that more than
10% require a service call in the first
year?
 What would you say about the
commercial’s honesty if in a random
sample of 400 households, 10% report
at least one service call?
31
Problem 9.38 Answer

Null Hypothesis: H0: p=0.05
 Alternative Hypothesis: p>0.05
 Statistic: z  ( pˆ  Epˆ ) / s  ( pˆ  p ) / (1 / n) pˆ (1  pˆ )
pˆ
z  (0.10  0.05) / (1 / 400)(0.05)0.95
z  4.59
32
C ontinuous D ens ity of the Standardiz ed N ormal Variate, Z
0.5
N OR MD EN S
0.4
0.3
Z critical
0.2
5%
0.1
Z.
0.0
-4
-2
0
2
1.645
4
4.59
Z
33
Sample means and population
means where the population
variance is known
34
Problem 9.26, Sample Means
 The
dean of a business school claims
that the average MBA graduate is
offered a starting salary of $55,000. The
standard deviation of the offers is
$4600. What is the probability that in a
sample of 38 MBA graduates , the mean
starting salary is less than $53,000?
35
Problem 9.26 (Cont.)
Null Hypothesis: H0: m  55,000
 Alternative Hypothesis: HA: m < 55,000
 Statistic:

z  ( x  Ex ) / s x  ( x  m ) /(s / n )
z  (55000 53000) /(4600/ 38
z  2000/ 746.3  2.68
36
C ontinuous D ens ity of the Standardiz ed N ormal Variate, Z
0.5
N OR MD EN S
0.4
0.3
0.2
Zcrit(1%)= -2.33
0.1
0.0
-4
0.0037%
-2
2.68
0
2
4
Z
37
Sample means and population
means when the population
variance is unknown
38
Problems 12.33
 A federal
agency responsible for
enforcing laws governing weights and
measures routinely inspects packages
to determine whether the weight of the
contents is at least as great as that
advertised on the package. A random
sample of 18 containers whose
packaging states that the contents
weighs 8 ounces was drawn.
39
Problems 12.33 (Cont.)

Can we conclude that on average the
containers are mislabeled? Use a  0.1.
t  ( x  Ex ) / s x  ( x  m) /(s / n )
40
Density Function for Student's t-distribution, 17 Degrees of Freedom
0 .4
TDENS
0 .3
0 .2
t crit 5%
0 .1
0 .0
-2
1.74
-1
0
R AN D T
1
2
1.74
41
Problems 12.33 (Cont.)
7.8
7.97
7.92
7.91
7.95
7.87
7.93
7.79
7.92
7.99
8.06
7.98
7.94
7.82
8.05
7.75
7.89
7.91
42
Mean
7.913888889
Standard Error
Median
Mode
Standard Deviation
0.019969567
7.92
7.91
0.084723695
Sample Variance
Kurtosis
Skewness
0.007178105
-0.24366084
-0.22739254
Range
Minimum
Maximum
Sum
Count
0.31
7.75
8.06
142.45
18
43
Problems 12.33 (Cont.)

Can we conclude that on average the
containers are mislabeled? Use a  0.1.
t  ( x  Ex ) / s x  ( x  m ) /( s / n )
t  (7.914 8) /(0.0847/ 18)  0.086/ 0.020
t  4.3
44
Confidence Intervals for
Variances
45
Problems 12.33 &12.55
 A federal
agency responsible for
enforcing laws governing weights and
measures routinely inspects packages
to determine whether the weight of the
contents is at least as great as that
advertised on the package. A random
sample of 18 containers whose
packaging states that the contents
weighs 8 ounces was drawn.
46
Problems 12.33 &12.55 (Cont.)
 Estimate
with 95% confidence the
variance in contents’ weight.
 c2 variable with n-1 degrees of freedom
is (n-1)s2 /s2
47
Chi Square Density for 17 Degrees of Freedom
0.08
CHIDENS
0.06
30.191
0.04
7.564
0.02
2.5%
2.5%
0.00
5
10
15
20
25
RANDCHI
48
Problems 12.33 &12.55(Cont.)
7.8
7.97
7.92
7.91
7.95
7.87
7.93
7.79
7.92
7.99
8.06
7.98
7.94
7.82
8.05
7.75
7.89
7.91
49
Mean
7.913888889
Standard Error
Median
Mode
Standard Deviation
0.019969567
7.92
7.91
0.084723695
Sample Variance
Kurtosis
Skewness
0.007178105
-0.24366084
-0.22739254
Range
Minimum
Maximum
Sum
Count
0.31
7.75
8.06
142.45
18
50
Problems 12.33 &12.55(Cont.)
 7.564<(n-1)s2 /s2<30.191
 7.564<17*0.00718/s2<30.191
 (1/7.564)*17*0.00718>s2>(1/30.191)*17*0
.00718
 0.0161>s2>0.0040
51
IV. Differences in Populations
Null Hypothesis: H0: m1  m2, or m1  m2 =0
 Alternative Hypothesis: HA: m1  m2 ≠ 0

t  [(x1  x2 )  ( m1  m 2 )] / s x1  x2
Var[ x1  x2 ]  E[(x1  x2 )  ( m1  m 2 )]2
Var[ x1  x2 ]  E[(x1  m1 )  ( x2  m 2 )]
2
Var[ x1  x2 ]  E[(x1  m1 ) 2  ( x2  m 2 ) 2  2( x1  m1 )(x2  m 2 )]
Var[ x1  x2 ]  Varx1  Varx2  2Covx1 x2
52
IV. Differences in Populations
t  [(x1  x2 )  ( m1  m 2 )] / s x1  x2
Var[ x1  x2 ]  Varx1  Varx2  2Covx1 x2
t  [(x1  x2 )  ( m1  m 2 )] / [Varx1  Varx2 ]
t  [(x1  x2 )  ( m1  m 2 )] / [(s / n1 )  (s / n2 )]
2
1
2
2
Reference Ch. 9 & Ch. 13
53
V. Regression

Model: yi = a + b*xi + ei
Fitted : yˆ i  aˆ  bˆ * xi
n
n
i 1
i 1
estim ate: bˆ   [ yi  y ][xi  x ] /  [ xi  x ]2
estim ate: aˆ  y  bˆ * x
estim ated_ error : eˆi  ( yi  yˆ i )
n
Sum _ of _ Squared_ Re siduals:  eˆi
2
i 1
n
ANOVA : Total _ Sum _ of _ Squares(TSS)   [ yi  y ]2
i 1
TSS  Explained_ Sum( ESS)  Un explained _ Sum(USS)
n
n
i 1
i 1
2
TSS  bˆ 2  [ xi  x ]2   eˆi
54
Lab Five
Fortune 500, 1999: Assets Vs. Revenue, In Logs
1000000
100000
Citigroup
Bank of America
Fannie May
Chase Manhatten
General Electric
Morgan Stanley
Prudential
Merrill Lynch
General Motors
TIAA-CREF
Bank One
American International
Exxon Mobil
State Farm
Log Assets
Allstate
Wal-Mart
Kroger
10000
McKesson HBOC
Ingram Micro
Costco Wholesale
1000
10000
100000
1000000
Log Revenue
55
The Financials
rank
5
7
11
26
31
48
50
30
29
19
17
firm
General Electric
Citigroup
Bank of America Corp.
Fannie Mae
Chase Manhatten Corp.
Prudential Ins.Co. of America
Bank One Corp.
Morgan Stanley Dean Witter
Merrill Lynch
TIAA-CREF
American International Group
industry
revenue M$ profits M$ assets M$
Diversified Financials
111630
10717
405200
Diversified Financials
82005
9867
716900
Commercial banks
51392
7882
632574
Diversified Financials
36968.6
3911.9 575167.4
Commercial Banks
33710
5446
406105
Insurance: Life, Health(stock)
26618
813
285094
Commercial Banks
25986
3479
269425
Securities
33928
4791
366967
Securities
34879
2618
328071
Insurance: Life, Health(mutual)
39410.2 1024.07 289247.99
Insurance; P&C(stock)
40656.08 5055.44
268238
56
Excel Chart
The Financials: Eleven Firms
y = 0.4335x + 8.2535
13.6
2
R = 0.3039
ln Assets M$
13.4
13.2
13
12.8
12.6
12.4
10
10.2
10.4
10.6
10.8
11
11.2
11.4
11.6
11.8
ln Revenue M$
57
Excel Regression
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.5512779
R Square
0.3039073
Adjusted R Square 0.2265636
Standard Error
0.3117374
Observations
11
ANOVA
df
Regression
Residual
Total
Intercept
X Variable 1
1
9
10
SS
MS
F
Significance F
0.381851405 0.381851 3.929312 0.078773838
0.874622016 0.09718
1.256473421
Coefficients Standard Error
t Stat
P-value
Lower 95% Upper 95%Lower 95.0%
Upper 95.0%
8.2534951
2.33138973 3.540161 0.006313 2.979521108 13.52747 2.979521 13.52747
0.4335105 0.218696259 1.982249 0.078774 -0.061215204 0.928236 -0.06122 0.928236
58
Eviews Chart
13.6
Eleven Financial Firms
13.4
LNASSETS
13.2
13.0
12.8
12.6
12.4
10.0
10.5
11.0
11.5
12.0
LNSALES
59
Eviews Regression
60
Eviews: Actual, Fitted & residual
13.6
13.4
13.2
13.0
12.8
0.6
12.6
0.4
12.4
0.2
0.0
-0.2
-0.4
1
2
3
4
5
Res idual
6
7
Ac tual
8
9
10
11
Fitted
61