Political Research and Statistics

Download Report

Transcript Political Research and Statistics

Design and Cross-Tabs
10/10/2013
Readings
• Chapter 3 Proposing Explanations, Framing
Hypotheses, and Making Comparisons
(Pollock) (pp. 58-76)
• Chapter 5 Making Controlled Comparisons
(Pollock)
• Chapter 4 Making Comparisons (Pollock
Workbook)
OPPORTUNITIES TO DISCUSS
COURSE CONTENT
Office Hours For the Week
• When
– Monday 10-12
– Tuesday 8-12
– And by appointment
Homework
• Due 10/15/2013
– Homework assignment on the paper
Course Learning Objectives
1. Students will learn the basics of research
design and be able to critically analyze the
advantages and disadvantages of different
types of design.
2. Students will achieve competency in
conducting statistical data analysis using the
SPSS software program.
OBSERVATIONAL STUDIES
Cross-Sectional Design
• What is it?
• An Example
• Limits on Cross Section
Time-Series Design
• Cross-Sectional design over time
• Problems with time series design
RCP tries for Content Validity
Panel Data
• A Special Kind
of Time Series
• Expensive and
difficult to
collect
• You can track
individual
change…
Field Research
• Observations in a natural setting
• People do not like to be followed (reactivity),
other things are less reactive
Case Study Design
• What is it?
• N=1 (one unit in your
study)
• Problems
Bivariate Data Analysis
CROSS-TABULATIONS
Variables
• Dependent Variable- the variable/result that
you want to explain.
• Independent Variable(s)- the variables that
you believe will cause/explain/change your
dependent variable
Univariate Statistics answer discrete
questions
What are Cross Tabs?
• a simple and effective way to measure
relationships between two variables.
• also called contingency tables- because it
helps us look at whether the value of one
variable is "contingent" upon that of another
When to use them
• When you have 2
variables
• They can only be used for
categorical variables
– ordinal (variables are
ranked, but the differences
between them are not
certain (Less than HS, Hs,
College, Grad School),
– nominal variables (the
variables are simply given
names Gingrich, Perry,
Romney, Santorum)
You can use them if you have
• two ordinal variables
• one nominal and one
ordinal variable
• two nominal variables.
• Your variable takes
fewer than 10
categories
When it is a bad method
• If you have ratio or
interval variables
• You have a variable with
more than 10 values
• You want to test
multiple independent
variables against a
single d.v. in one model
Useless
DOING CROSS-TABS IN SPSS
Open Up the GSS
• Open GSS2008
A Simple Hypothesis
What is it
• Dependent variable- Happiness
• Independent variable- Intelligence
• What would be the hypothesis
Running Cross Tabs
• Select, Analyze
– Descriptive Statistics
– Cross Tabulations
Running Cross-Tabs
• Dependent variable is
usually the row
• Independent variable is
usually the column.
We have to use the
measures available
Case Processing Summary
• Ignore the case Processing Summary
• Delete it from your outputs
Cross-Tab Terminology
• Rows (appear along the side of the table) and
Columns (appear at the top)
• the categories formed by the intersection of a
column is called a cell
The Outputs
• As “Education increases, unhappiness
increases”
• Raw Counts are not very helpful
Most are
“pretty
happy”
Lets Add Some Percent's
Click on Cells
Cell Display
Row %'s- This measures data across the row
• 15% of people who are very happy have less than 11 years
of education
• Overall 27.7% have 16+ and 17.5% are less than 12.
• This allows us to measure change across one category and
compare to the total
Column %'s This measures data down
each column
• Compare across each column
– 37.0% of 16+ are very happy vs 27.0% of >11’s
– 6.5 of 16+ are not too happy, vs 23.3% of >11’s
– Overall, 31.6% of people are very happy
How we interpret
• Using Face Validity to interpret x-tabs
– is there a pattern?
– does one column stand out?
• When we have two ordinal variables we can
state directional relationships!
• Would you say Hemmingway is correct?
THE COMPARE MEANS TEST
When do we use this?
• A way to compare ratio
variables by controlling
for an ordinal or nominal
variable
– One ordinal vs. a ratio or
interval
– One nominal vs. a ratio or
interval
• This shows the average of
each category
In SPSS
• Open the States.SAV
• Analyze
– Compare Means
– Means
Where the Stuff Goes
• Your categorical
variable goes in the
independent List
• Your continuous
variable goes in the
Dependent List
Reading the Output
• We can compare each
region against each
other and the total
• For ordinal variables,
we can state
relationships
• This is all face validity!
The Practical Significance
• Why do some regions smoke more? (possible
i.v.’s)
• What are the policy effects?
• Lung cancer and Smoking
Hypothesis Testing
Variables
• Dependent Variable- the variable/result that
you want to explain.
• Independent Variable(s)- the variables that
you believe will cause/explain/change your
dependent variable
Why Hypothesis Testing
• To determine whether a relationship exists
between two variables and did not arise by
chance. (Statistical Significance)
• To measure the strength of the relationship
between an independent and a dependent
variable? (association)
TESTING FOR STATISTICAL
SIGNIFICANCE
What is Statistical Significance?
• Saying that an observed relationship really
exists and is not happening by chance
• It doesn't mean the finding is important or
that it has any real world application (beware
of large samples)
• Practical significance is often more important
Determining Statistical Significance
• Establishing parameters or “confidence intervals”
• Are we confident that our relationship is not
happening by chance?
• We want to be rigorous (we usually use the 95%
confidence interval any one remember why)
How do we establish confidence
• Establishing a “p” value or alpha value
• This is the amount of error we are willing to
accept and still say a relationship exists
P-values or Alpha levels
• p<.05 (95% confidence level) - There
is less than a 5% chance that we will
be wrong.
• p<.01. (99% confidence level) 1%
chance of being wrong
• p<.001 (99.9 confidence level) 1 in
1000 chance of being wrong
Problems of the Alpha level (p-value)
• Setting it too high
• Setting it too low
• We have to remember
our concepts and our
units of analysis
You should always use the 95%
Confidence interval (p<.05) unless
there is a good reason not to.