Economic Reasoning Using Statistics

Download Report

Transcript Economic Reasoning Using Statistics

ECONOMIC REASONING USING
STATISTICS
Econ 138
Dr. Adrienne Ohler
HOW YOU WILL LEARN.
Textbook: Stats: Data and
Models 2nd Ed., by Richard D.
DeVeaux, Paul E. Velleman,
and David E. Bock
 Homework: MyStatLab
brought to by
www.coursecompass.com

THE REST OF THIS CLASS
Attendance Policy
 Cellphone Policy
 Homeworks (10 out of 12)


Due Sundays by 11:59pm
Quizzes (5 out of 6)
 Exams

Oct. 10th
 Nov. 28

Cumulative Optional Final
 Data Project

HELP FOR THIS CLASS
READ THE BOOK
 Come to class prepared and awake
 READ THE BOOK
 Office Hours: T, H 9-11am and by Appointment
 READ THE BOOK
 Get a tutor at the Visor Center

ECONOMIC REASONING USING STATISTICS

What is economics?



Wealth


The study of scarcity, incentives, and choices.
The branch of knowledge concerned with the production,
consumption, and transfer of wealth. (google)
The health, happiness, and fortunes of a person or group.
(google)
What is/are statistics?
Statistics (the discipline) is a way of reasoning, a collection
of tools and methods, designed to help us understand the
world.
 Statistics (plural) are particular calculations made from
data.
 Data are values with a context.

STATISTICS


Statistics (the discipline) is a way of reasoning, a
collection of tools and methods, designed to help us
understand the world.
Will the sun rise tomorrow?
WHAT IS STATISTICS REALLY ABOUT?

A statistic is a number that represents a
characteristic of a population. (i.e. average,
standard deviation, maximum, minimum, range)
Statistics is about variation.
 All measurements are imperfect, since there is
variation that we cannot see.
 Statistics helps us to understand the real,
imperfect world in which we live and it helps us
to get closer to the unveiled truth.

THE LANGUAGE OF STATISTICS
For of literacy
 4 cows in a field
 7 cows by the road

4 cows in a field on the left
 3 cows in a field on the right


At a party
Average age is 18
 Average age is 22
 Average age is 75

IN THIS CLASS
Observe the real world
 Create a hypothesis
 Collect data
 Understand and classify our data
 Graph our data
 Standardize our data
 Apply probability rules to our data
 Test our hypothesis
 Interpret our results

QUESTIONING A STATISTIC
½ of all American children will witness the breakup of
a parent’s marriage. Of these, close to 1/2 will also see
the breakup of a parent’s second marriage.
A.

(Furstenberg et al, American Sociological Review �1983)
66% of the total adult population in this country is
currently overweight or obese.
B.

(http://win.niddk.nih.gov/statistics/)
28% of American adults have left the faith in which
they were raised in favor of another religion - or no
religion at all.
C.

(http://religions.pewforum.org/reports)
CHAPTER 2 - WHAT ARE DATA?




Information
Data can be numbers, record names, or other
labels.
Not all data represented by numbers are
numerical data (e.g., 1=male, 2=female).
Data are useless without their context…
Slid
e 211
THE “W’S”

To provide context we need the W’s
 Who
 What (and in what units)
 When
 Where
 Why (if possible)
 and How
of the data.

Note: the answers to “who” and “what” are
essential.
Slid
e 212
WHO

The Who of the data tells us the individual cases
about which (or whom) we have collected data.
Individuals who answer a survey are called
respondents.
 People on whom we experiment are called subjects or
participants.
 Animals, plants, and inanimate subjects are called
experimental units.


Sometimes people just refer to data values as
observations and are not clear about the Who.

But we need to know the Who of the data so we can
learn what the data say.
Slid
e 213
IDENTIFY THE WHO IN THE FOLLOWING
DATASET?



Are physically fit people less likely to die of
cancer?
Suppose an article in a sports medicine journal
reported results of a study that followed 22,563
men aged 30 to 87 for 5 years.
The physically fit men had a 57% lower risk of
death from cancer than the least fit group.
WHO ARE THEY STUDYING?
1.
The cause of death for 22,563 men in the study
2.
The fitness level of the 22,563 men in the study
3.
The age of each of the 22,563 men in the study
4.
The 22,563 men in the study
WHAT AND WHY



Variables are characteristics recorded about each
individual.
The variables should have a name that identify
What has been measured.
A categorical (or qualitative) variable names
categories and answers questions about how
cases fall into those categories.

Categorical examples: sex, race, ethnicity
Slid
e 216
WHAT AND WHY (CONT.)

A quantitative variable is a measured variable
(with units) that answers questions about the
quantity of what is being measured.

Quantitative examples: income ($), height (inches),
weight (pounds)
Slid
e 217
WHAT AND WHY (CONT.)

Example: In a fitness evaluation, one question
asked to evaluate the statement “I consider
myself physically fit” on the following scale:






1 = Disagree Strongly;
2 = Disagree;
3 = Neutral;
4 = Agree;
5 = Agree Strongly.
Question: Is fitness categorical or quantitative?
Slid
e 218
WHAT AND WHY (CONT.)
We sense an order to these ratings, but there are
no natural units for the variable fitness.
 Variables fitness are often called ordinal
variables.


With an ordinal variable, look at the Why of the
study to decide whether to treat it as categorical or
quantitative.
Slid
e 219
ARE FIT PEOPLE LESS LIKELY TO DIE OF
CANCER? -------------WHO IS THE POPULATION OF INTEREST?
1.
2.
3.
4.
All people
All men who exercise
All men who die of cancer
All men
25%
1
25%
25%
2
3
25%
4
IDENTIFYING IDENTIFIERS

Identifier variables are categorical variables with
exactly one individual in each category.



Examples: Social Security Number, ISBN, FedEx
Tracking Number
Don’t be tempted to analyze identifier variables.
Be careful not to consider all variables with one
case per category, like year, as identifier
variables.

The Why will help you decide how to treat identifier
variables.
Slid
e 221
COUNTS COUNT

When we count the cases in each category of a
categorical variable, the counts are not the data, but
something we summarize about the data.
The category labels are the What, and
 the individuals counted are the Who.

2009
2010
Percent
(2009)
Percent
(2010)
Male - Undergrad
8,106
8,111
44.2
44.4
Female
Undergraduate
10,238
10,143
55.8
55.6
Male – Graduate
864
888
34.4
35.4
Female Graduate
1,648
1,620
65.6
64.6
WHERE, WHEN, AND HOW
 When
and Where give us some nice
information about the context.
 Example: Values recorded at a
large public university may mean
something different than similar
values recorded at a small private
college.
Slid
e 223
WHERE, WHEN, AND HOW
GPA of Econ 101 classes.
 Class 1 – 2.56
 Class 2 – 3.34

Where – Washington State university
 When – during the fall and spring semesters

WHERE, WHEN, AND HOW (CONT.)

How the data are collected can make the
difference between insight and nonsense.

Example: results from voluntary Internet surveys are
often useless

Example: Data collection of ‘Who will win Republican
Primary?’
Survey ISU students on campus
 Run a Facebook survey
 Rasmussen Reports national telephone survey

WHY STATISTICS IS CHALLENGING?

Word problems…

Rules of statistics don’t change

Data is information






If you are struggling with a problem, always ask the
W questions about the data collected.
Who
What
When
Where
Why
CHAPTER 3

Displaying and Describing
 Categorical Data
METHODS OF DISPLAYING DATA
Frequency Table
 Relative Frequency table
 Bar Chart
 Relative Frequency bar chart
 Pie Chart
 Contingency table
 Contingency tables and Conditional Distributions
 Segmented Bar charts

Slide
3- 28
DATA ON STUDENTS
Gender
Year in
School
Major
My Class
Kim B.
Female
Sr.
Elem. Ed. ECO 138
Section 1
Stacie M.
Female
So.
Math
Tom A.
Male
Sr.
Elem. Ed. ECO 255
Section 1
Tim B.
Male
Jr.
Renew
ECO 255
Section 1
Kelly Y.
Male
Fr.
Safety
ECO 255
Section 2
ECO 138
Section 2
…
Slide
3- 29
FREQUENCY TABLES: MAKING PILES
We can “pile” the data by counting the number of
data values in each category of interest.
 We can organize these counts into a frequency
table, which records the totals and the category
names.

ECO 138
Male
50
Female
20
Total
70
FREQUENCY TABLES: MAKING PILES
(CONT.)

A relative frequency table is similar, but gives
the percentages (instead of counts) for each
category.
ECO 138
Male
50 / 70 * 100 =
71.43%
Female
20/70 * 100 =
28.57%
Total
70/70 * 100 =
100 %
BAR CHARTS



A bar chart displays the distribution of a categorical
variable, showing the counts for each category next to
each other for easy comparison.
A bar chart stays true
to the area principle.
Thus, a better display
for the ship data is:
Slide
3- 32
BAR CHARTS (CONT.)



A relative frequency bar chart displays the relative
proportion of counts for each category.
A relative frequency bar chart also stays true to the
area principle.
Replacing counts
with percentages
in the ship data:
Slide
3- 33
WHAT YEAR IN SCHOOL ARE YOU?
1.
2.
3.
4.
Freshman
Sophomore
Junior
Senior
61%
17%
17%
6%
Slide
3- 34
1
2
3
4
PIE CHARTS



When you are interested in parts of the whole, a
pie chart might be your display of choice.
Pie charts show the whole
group of cases as a circle.
They slice the circle into
pieces whose size is
proportional to the
fraction of the whole
in each category.
Slide
3- 35
METHODS OF DISPLAYING DATA
Frequency Table (How much?)
 Relative Frequency table (What percentage?)

Bar Chart (How much?)
 Relative Frequency bar chart (What percentage?)


Pie Chart (How much?)
Contingency table and Marginal Distributions
 Contingency tables and Conditional Distributions

Slide
3- 36
CONTINGENCY TABLES


A contingency table allows us to look at two
categorical variables together.
It shows how individuals are distributed along each
variable, contingent on the value of the other
variable.
 Example: we can examine the class of ticket and
whether a person survived the Titanic:
Slide
3- 37
CONTINGENCY TABLE
The two variables in this contingency table is
gender and class/section number.
Male
ECO 255 – ECO 138 – ECO 138 –
Total
Section 1 Section 1 Section 2
29
26
24
79
Female
4
9
11
24
Total
33
35
35
103
CONTINGENCY TABLES (CONT.)


The margins of the table, both on the right and on the
bottom, give totals and the frequency distributions for each
of the variables.
Each frequency distribution is called a marginal distribution
of its respective variable.
Slide
3- 39
CONDITIONAL DISTRIBUTIONS

A conditional distribution shows the distribution
of one variable for just the individuals who
satisfy some condition on another variable.

The following is the conditional distribution of ticket
Class, conditional on having survived:
CONDITIONAL DISTRIBUTIONS (CONT.)

The following is the conditional distribution of ticket
Class, conditional on having perished:
Slide
3- 41
WHAT CAN GO WRONG? (CONT.)

Don’t confuse similar-sounding percentages—pay
particular attention to the wording of the context.

The percentage of students that are female & in ECO
138 Section 1


The percentage of females that are in ECO 138
Section 1


(cell distribution)
(conditioned upon females)
The percentage of ECO 138 Section 1 students that
are females

(conditioned upon ECO 138 Section 1)
CONDITIONAL DISTRIBUTIONS (CONT.)
 The
conditional distributions tell us that there is
a difference in class for those who survived and
those who perished.
 This
is better
shown with
pie charts of
the two
distributions:
Slide
3- 43
SEGMENTED BAR CHARTS
A segmented bar
chart displays the
same information as a
pie chart, but in the
form of bars instead of
circles.
 Here is the segmented
bar chart for ticket
Class by Survival
status:

Slide
3- 44
CONDITIONAL DISTRIBUTIONS (CONT.)



We see that the distribution of Class/Section for the
male is different from that of the female.
This leads us to believe that Class/Section and Gender
are associated, that they are not independent.
The variables would be considered independent when
the distribution of one variable in a contingency table is
the same for all categories of the other variable.
Slide
3- 46
Slide
3- 47
Slide
3- 48
Slide
3- 49
Slide
3- 50
Slide
3- 51
Slide
3- 52
WHICH OF THE COMPARISONS DO YOU
CONSIDER MOST VALID?
1.
2.
3.
Overall average, b/c it does not differentiate
93%
between the four programs.
Individual program comparisons, b/c they take
into account the different number of applicants
and admission rates for each of the four
programs.
Overall average, b/c it takes into account the
differences in number of applicants and
admission rates for each of the four programs.
7%
1
0%
2
3
NEXT TIME…

Chapter 4 – Displaying Quantitative Data
Slide
3- 54