Survey Documentation and Analysis (SDA)

Download Report

Transcript Survey Documentation and Analysis (SDA)

Survey Documentation
and Analysis (SDA)
Workshop Agenda
 Overview
 What
is online analysis?
 Available SDA data sets
 Statistical procedures (Frequencies,
Crosstabs, Regression)
 Recoding, subsetting, downloading
 Teaching resources for SDA and
developing instructional materials
SSRIC
Social Science Research & Instructional Council
http://www.ssric.org
The Council
 Oldest

CSU discipline council
Founded in 1972
 Representatives
from CSU campuses
meet three times per year
 Negotiates with data providers for
access to data
 Promotes use of data analysis in
research and teaching
The Council
 Annual


student research conference
at CSU Long Beach in 2008
at CSU Sacramento in 2009
 Sponsors
travel to ICPSR summer
workshops in Ann Arbor, Michigan

http://www.ssric.org/participate/icpsr_summer
 Works


with Field Research
Question credits to California Field Poll
Selects faculty fellow
What is Online Analysis?
 “Online
data analysis" refers to the ability
to perform statistical analysis using special
Web-based software as an alternative to
downloading data into a standalone
statistical package on your computer.
 The software we’re using is called Survey
Documentation and Analysis (SDA), which
was developed at the University of
California, Berkeley.
Alternative Statistical Packages
 You
can get a complete list of available
online statistical packages at
http://statpages.org/
 Some of these include:




OpenStat
ViSta
Statext
SISA
Advantages
like SDA are free – don’t require a
site license
 Only require a computer with an internet
connection
 Some like SDA are easy to learn
 Can show students how to use some of
them in 30 minutes or less
 Many
Disadvantages
 Some
online statistical packages (certainly
not all) are limited in what they can do
statistically
 Documentation is not very good for some
 Some (like SDA) can only be used with
data sets that have already been created
in a format that can be read by that
package
Available SDA Data Sets
SDA Data Sets
 While
SDA is an extremely easy statistical
package to learn to use, it’s difficult to
create SDA data sets.
 You have to purchase a SDA site license
to create a data set and then learn how to
use it.
 So we typically use SDA data sets that
have been created for us.
Sources for SDA Data Sets

SDA Archive located at UC Berkeley
(http://sda.berkeley.edu/archive.htm)
 ICPSR Topical Archives
(http://www.icpsr.org/cocoon/ICPSR/all/archives.xml)
 Field data located at UC Berkeley
(http://ucdata.berkeley.edu/data_record.php?reci
d=3#analyze)
 List of SDA data sets at CSU Long Beach
(http://www.csulb.edu/library/eref/datasets.html)
 University of Denver’s IDEA project
(http://www.du.edu/idea/data.htm
SDA Archive at UC Berkeley
(http://sda.berkeley.edu/archive.htm)
 GSS
Cumulative Datafile (1972-2008;
2008 is a preliminary version).
 ANES Cumulative Datafile (1948-2000)
and ANES datafiles for 1996, 2000, and
2004.
 Census microdata including 2000-2003
American Community Surveys and 1990
and 2000 U.S. 1% PUMS with separate
files for 2000 and 1990 California PUMS.
ICPSR

National Archive of Computerized Data on Aging
(http://www.icpsr.umich.edu/NACDA/)
 National Archive of Criminal Justice Data
(http://www.icpsr.umich.edu/NACJD/)
 Substance Abuse and Mental Health Data
Archive (http://www.icpsr.umich.edu/SAMHDA/)
 International Archive of Education Data
(http://www.icpsr.umich.edu/IAED/)
Field Data
http://ucdata.berkeley.edu/data_record.php?recid=3#analyze
 Field
Polls from 1956 through 2006 are
available as publicly-accessible SDA data
sets
 More recent Field Polls are available as
SPSS data sets (through FTP) for CSU
faculty, staff, and students.
Other Sources of SDA Data Sets
at ICPSR
 Voting
Behavior: The 2004 Election by
Charles Prysby and Carmine Scavo
(http://www.icpsr.umich.edu/SETUPS/)
 Investigating Community and Social
Capital by Lori Weber
(http://www.icpsr.umich.edu/ICSC/index.ht
m)
Statistical Procedures
Available Statistical Procedures
 Frequencies
and crosstabulation
(discussed in this workshop)
 Comparison of means
 Correlation matrix
 Comparison of correlations
 Multiple regression (discussed in this
workshop)
 Logit/Probit regression
Using SDA
 Select
the data set
 Look at the codebook
 Decide what statistical procedure to use
 Fill in what you want to do
 Run it
Data Set
 We’re
going to use the GSS 1972-2008
Cumulative Data File (2008 is preliminary
data)

http://sda.berkeley.edu/archive.htm
 We’re



going to use three variables
SEX
RELITEN
PORNLAW
Frequencies
 List

the variables you want to use
ROW: SEX,RELITEN,PORNLAW
 Click
on “Run the Table”
Crosstabs
 Now
let’s use RELITEN as our
independent variable and PORNLAW as
our dependent variable to create two
bivariate crosstabulations.
 List the variables


ROW: PORNLAW
COLUMN: RELITEN
Crosstabulation Continued
 Options




Percentaging: column
Statistics
Question text
Color coding
 Run
the Table
Your Turn
 Let’s


run two more bivariate crosstabs
Independent variable: SEX
Dependent variables: RELITEN and
PORNLAW
 Go
ahead and run these crosstabs
What Did we Discover?




RELITEN is strongly related to PORNLAW.
SEX is also related to both RELITEN and PORNLAW.
Could the relationship between RELITEN and
PORNLAW be spurious? SEX is related to both
RELITEN and PORNLAW and could be creating the
relationship between RELITEN and PORNLAW.
How do we test this possibility? Let’s run a threevariable crosstabulation with RELITEN as our
independent variable, PORNLAW as our dependent
variable, and SEX as our control variable.
Multivariate Crosstabulation
 List



the variables
ROW: PORNLAW
COLUMN: RELITEN
CONTROL: SEX
 Options




Percentaging: column
Statistics
Question text
Color coding
Spuriousness
 Was
the relationship between RELITEN
and PORNLAW spurious due to SEX?
 How do you know?
 Does that mean that the relationship can
never be spurious?
Regression
 Crosstabulation
is used when all the
variables are categorical.
 What do we do when our variables are
continuous (i.e., interval and/or ratio)?
 Regression is the answer.
Bivariate Regression
Let’s look at the relationship between the
respondent’s socioeconomic status (SEI) and
the amount of television one watches
(TVHOURS).
 List the variables




Dependent: TVHOURS
Independent: SEI
Options




T-Tests
Correlation matrix
Color coding
Question Text
Multivariate Regression
 Now
let’s add in another variable: SEX
 But sex is not a continuous variable. How
do we enter a variable like SEX into the
regression analysis? Answer: create a
dummy variable.
 Dummy variables take on the values of 1
and 0.
Creating a Dummy Variable
 SEX




(d:1)
SEX is the name of the variable to want to
make into a dummy variable
d indicates that you want to create a dummy
variable
1 indicates that the value 1 will be assigned
the value 1. All other values will be assigned
the value 0.
Run the table
Recoding, Subsetting,
Downloading
Recoding Existing Variables
Example (from GSS Cumulative File): ATTEND (How often
Respondent attends religious services)
ATTEND
0 Never
1 Less than once a year
2 Once a year
3 Several times a year
4 Once a month
5 2 to 3 times a month
6 Nearly Every Wk
7 Every week
8 More than once a week
9 DK/NA (Missing)
ATTENDR
1 Seldom (0 to 3)
2 Sometimes (4 to 5)
3 Often (6 to 8)
9 Missing (9)
Your Turn
Recode AGE into the following categories:
1 = 18-29
2 = 30-64
3 = 65 and older
Obtain FREQUENCIES for the result
For More Information, See:
http://sda.berkeley.edu/HELPDOCS/helpne
wv.htm#recode
Compute a New Variable
Example (from GSS Cumulative File): Alienation Index
Create measure of ALIENATION from these variables asked in 1978
only (all coded as 1=agree, 2=disagree, other = missing data)
 ALIENAT1 PEOPLE RUNNING COUNTRY DONT CARE
 ALIENAT2 RICH GET RICHER, POOR POORER
 ALIENAT3 WHAT YOU THINK DOESNT COUNT
 ALIENAT4 YOU'RE LEFT OUT OF THINGS
 ALIENAT5 POWERFUL PEOPLE TAKE ADVANTAGE OF YOU
 ALIENAT6 PEOPLE IN WASH D.C. ARE OUT OF TOUCH
Your Turn
Create an index of parental education:
(MAEDUC + PAEDUC)/2
For More Information, See:
http://sda.berkeley.edu/HELPDOCS/helpne
wv.htm#compute
Subsetting and Downloading
Example: create and download a subset of
the GSS cumulative file, selecting only
cases from 2008, all Case Identification
variables and some Personal and Family
Information variables (MARITAL,
AGEWED, DIVORCE, WIDOWED).
At end of each intermediate step, click on
“Continue” button.
SPSS Syntax File
Creating an SPSS system file

Run SPSS (syntax) file against data (ASCII) file.
 For more information, see




http://www.ssric.org/data/icpsr_direct (scroll down)
http://www.ssric.org/data/icpsr_direct (scroll to
“Syntax Files”)
http://www.icpsr.com/cocoon/ICPSR/FAQ/0062.xml
http://web.pdx.edu/~stipakb/download/Data/SDA_dat
a_to_SPSS.pdf (portions outdated)
File Directory
Your Turn
Subset and download your own custom
GSS SPSS system file.
Sample Instructional Applications:
Crosstabs With a Control Variable
Example 1
GSS Cumulative File (selecting 2002 and
2004 only):
1.
2.
Crosstab Voting in 2000 election
(VOTE00) by computer usage
(COMPUSE).
Repeat, but with a control for
respondent’s education level (DEGREE).
Example 2
ANES 2004 Study:
Instructor’s note: In addition to using this example in teaching use of
control variables, I also use it in teaching about reactivity in
interviewing.
1.
2.
3.
4.
5.
6.
Run frequency distribution for V5205 (Working mother can have
warm relationship with kids).
Crosstab V5205 with V1109a (Respondent gender). Weight by
Post-election weight
Repeat, but use V4103 (Interviewer gender) as independent
variable
Run frequency distribution for V4103
Repeat #1 with a control for V4103
Repeat #2 with a control for V1109a
Teaching Resources for SDA
and Developing Instructional Materials
ICPSR Web-Based Instructional Materials
http://www.icpsr.umich.edu/ICPSR/training/index.html#instructional
Investigating Community & Social Capital
http://www.icpsr.umich.edu/ICSC/index.html
Voting Behavior: the 2004 Election
http://www.icpsr.umich.edu/SETUPS/index.html