Introductory Workshop SPSS

Download Report

Transcript Introductory Workshop SPSS

Introductory Workshop
SPSS
CSU Fresno
March 12, 2010
Social Science Research and
Instructional Council (SSRIC)
• Discipline council for the social sciences
made up of representatives from each
campus in the CSU. List of campus
representatives can be found at
http://www.ssric.org/reps
• Promotes use of data analysis in research
and teaching
• Website is at http://www.ssric.org
Social Science Data Bases
• The SSRIC helps maintain and promote
the use of the social science data bases in
the CSU
• Data bases include:
– Inter-university Consortium for Political and
Social Research (ICPSR)
– The Field Institute
– The Roper Center for Public Opinion
Research
Agenda for the Introductory
SPSS Workshop
• Overview of SPSS
• A brief tour
• Creating you’re your own SPSS data file or opening a data file you
got somewhere else
• Transforming data
– Recode
– Compute
– Select If
• Univariate analysis
– Frequencies
– Descriptives
– Explore
• A look ahead at the intermediate workshop – March 15 from 9:00 am
to noon
Overview of SPSS
• SPSS is a statistical package for
beginning, intermediate, and advanced
data analysis
• Other statistical packages include SAS
and Stata
• Online statistical packages that don’t
require site licenses include SDA
Text – SPSS for Windows
Version 16 A Basic Tutorial
• Authors: Linda Fiddler (Bakersfield), Laura
Hecht (Bakersfield), Ed Nelson (Fresno),
Elizabeth Nelson (Fresno), Jim Ross
(Bakersfield)
• Available from McGraw-Hill Custom Publishing.
Call 800-338-3987 to order. Request ISBN 007-353833-7
• Available on the web at
http://www.ssric.org/trd/spss16. The data
set for this workshop can be downloaded at this
site
SPSS Files and Extensions
•
•
•
•
Portable file -- .por
Data file -- .sav
Output file -- .spo
Syntax file -- .sps
Opening SPSS
• Go to start and find SPSS for Windows
• Click on SPSS 16.0 to open
• You’ll need to update your SPSS license
every year (or your school technician will
do it for you)
A Brief Tour of SPSS
(see ch. 1 in text)
• Frequencies -- Analyze/Descriptive
Statistics/Frequencies
– Select ABANY and move it to the big box and click on
OK
• Crosstabs – Analyze/Descriptive
Statistics/Crosstabs
–
–
–
–
Move ABANY to the “Row” box
Move SEX to the “Column” box
Click on “Cells” and select “Column” percents
Click on OK
A Brief Tour Continued
• Comparing means – Analyze/Compare
Means/Means
– Move AGEKDBRN and EDUC in the
“Dependent List” box
– Move SEX to the “Independent List” box
– Click on OK
A Brief Tour Continued
• Correlations
– Analyze/Correlate/Bivariate
– Move EDUC, MAEDUC, and PAEDUC into
the “Variables” box
– Click on OK
A Brief Tour Continued
• Scatterplots
– Graphs/Legacy Dialogs/Scatter/Dot
– Click on “Simple Scatter” and then on “Define”
– Move EDUC into the “Y axis” box
– Move PAEDUC into the “X Axis” box
– Click on OK
Creating Your Own SPSS Data File
(see ch. 2 in text)
• Involves creating:
– Variable names
– Variable labels
– Value labels
– Missing values
Creating a Data File in SPSS
• Questions (see p. 11)
– Age
– Sex
– Religious preference
– Political views
– Type of marriage preferred
– Opinion on abortion (7 different questions)
Basic Steps in Creating a Data File
• Assign identification number to each case
• Assign each variable a variable name and
an extended variable label
• Each variable will have a set of values.
Assign each value an extended value label
• If a variable has missing information,
decide which values will be used as the
missing values
Variable Names
• Traditionally variable names had to be 8
characters or less, start with a letter, and contain
no embedded blanks
• Now they can be longer than 8 characters, but
we’ll stick with names of 8 or fewer characters
• Names can contain some special characters, but
not all such characters. So we only use
hyphens (-) as special characters in names
Variable Names
•
•
•
•
•
•
Age is named AGE
Sex is named SEX
Religious preference is named REL
Political orientation is named C-L
Preferred marriage is named MG
There are seven abortion variables and
they are named ABD, ABN, ABH, ABP,
ABR, ABS, ABA
Entering the Information
for a Data File
• You already have SPSS open
• Click on File/New/Data
• You should see a blank data screen that
looks like a spreadsheet
• At the bottom are two tabs called “Data
View” and “Variable View”. Click on
“Variable View”
Defining the Variables
• Enter the variable names in the “Names”
columns in the order you want them
• Enter the variable labels in the “Label” column
• Enter the value labels in the “Values” column.
To do this you will need to click in the
appropriate cell and then click in the little gray
box on the right
• Enter the missing values in the “Missing”
column. To do this you will need to click in the
appropriate cell and then click in the little gray
box on the right
Adding in the Data
• Now that you have defined the variables, click
on the tab at the bottom called “Data View” and
enter the data into the appropriate cells. The
data are on p. 18 of the text
• Once you have entered the data, go back and
check to make sure you didn’t make any data
entry errors
• Congratulations!! – you created a SPSS data
file. You could also enter the data using a
spreadsheet like Excel
Saving the Data File
• Now you want to save your data file
• Click on “Save as”. The default is to save
it as a SPSS data file with .sav as the
extension
• Give it a file name and indicate where you
want to save it on your hard drive or on
your flashdrive
Opening an Existing File
You Got Somewhere Else
• Often you will want to open a data set that you
got from someplace else such as:
– ICPSR
– Field Institute
– Roper Center
• These files will usually be in the form of a:
–
–
–
–
SPSS portable file (.por)
SPSS data file (.sav)
Raw data file with a SPSS syntax file (.sps)
Raw data file without a syntax file
Opening a Portable file
• Click on the open yellow folder to open a
new file
• Change file type to .por
• Browse to where the portable file you want
to open is located and double click on that
file
Opening an SPSS Data File
• Click on the open yellow folder to open a new
file
• Change file type to .sav
• Browse to where the data file you want to open
is located and double click on that file
• We’re going to use the data set that comes with
the text – gss06a.sav. You can download it from
the web site that has the text --
http://www.ssric.org/tr/onlinetextbooks.
Look for the text – “Right click here to download
GSS06A.”
Opening a Raw Data File with a
SPSS Syntax File
• Sometimes you will need to open a raw data file
(ASCII or text) and there will be an
accompanying SPSS syntax file
• You will need to modify the “File Handle” and
“Save Outfile” commands
• See
http://www.ssric.org/files/ASCII_to_SPSS.pdf
and
http://www.icpsr.umich.edu/cocoon/ICPSR/FAQ/
0062.xml for more information
• You may need help doing this. Feel free to
contact me for help
Opening a Raw Data File Without
a SPSS Syntax File
• If you don’t have a SPSS syntax file you
will have to use the codebook that came
with the data and create your own syntax
file
• You may need help doing this. Feel free to
contact me for help
Choosing Options in SPSS
• Click on “Edit” and “Options.”
• General tab -- under “Variable Lists,”
check “Display Names” and “Alphabetical.”
• Output Labels tab -- select “Names and
Labels” in the first box, and “Values and
Labels” in the second.
What’s Next?
• Now you know how to create a SPSS data
file and how to open an existing SPSS
portable or data file
• Next we’ll learn how to transform variables
Transforming Data
(see ch. 3 in text)
• We can transform variables by recoding which
means to combine categories on an existing
variable into fewer categories
• We can transform variables by creating new
variables out of existing variables
• We can select particular cases and analyze only
these cases
• We can do other things like weighting cases that
we’re not going to talk about in this workshop.
Recoding Variables
• Recoding into different variables
• Recoding into the same variable
• We recommend recoding into different
variables and not using the into same
variable option
Recoding into Different Variables
• Click on “Transform” and then on “Recode”
and then on “into different variables”
• Select the variable you want to recode
• Start by giving the new variable a new
name and assigning a variable label to the
new variable. Click on “Change”
Recoding AGE into AGE1
• Recode AGE into four categories and give it the
name of AGE1
– Click on “Old and New Values”
• Use “Range” (fourth option down) to recode as
follows. Remember to click on “Add” after
entering each recode
–
–
–
–
18 to 29 = 1
30 to 49 = 2
50 to 69 = 3
70 to 89 = 4
Recoding Options
• When you click on “Old and New Values”
there will be seven options
• For most recoding you will only have to
use two of these options
– The first option from the top allows you to
recode a single value into a new value
– The fourth option from the top allows you to
recode a range of values from X to Y into a
new value
Assign Value Labels to the
Four Categories of AGE1
• Go into “Variable View”
• Find the variable AGE1 (should be at the
bottom of the list of variables)
• Click in the “Values” column and then click
on the small gray box
• Enter the value labels
• Click on OK
Exercises for Recoding
• INCOME06 is total family income. Do a
frequency distribution to see what it looks like
before recoding
• Recode into 4 categories and call this new
variable INCOME1. Use the following
categories: under $20K, $20K to under $40K,
$40K to under $60K, and $60K and over
• Add the value labels
• Run a frequency distribution for INCOME1 and
check to make sure that you recoded it correctly
by comparing the unrecoded and recoded
frequency distributions
More Exercises for Recoding
• Now recode INCOME06 again and call the new
variable INCOME2
• This time use 8 categories: under $10K, $10K
to under $20K, $20K to under $30K, $30K to
under $40K, $40K to under $50K, $50K to under
$60K, $60K to under $75K, and $75K and over
• Add the value labels
• Run a frequency distribution for INCOME2 and
check to make sure that you recoded it correctly
by comparing the unrecoded and recoded
frequency distributions
Creating a New Variable
with Compute
• Let’s create a new variable and call it
ABORTION which is the sum of the seven
abortion variables
• Click on “Transform” and then on “Compute”
• Enter the new variable name (ABORTION) into
the target variable box
• Enter the formula for this new variable into the
“Numeric Expression” box
• Click on OK
Dealing with Missing Data
• If there is missing data for any of these variables
(ABANY to ABSINGLE), the new variable
ABORTION will be assigned a system missing
value
• What do we do if we want to allow no more than
two missing values?
• Let’s compute the mean value and divide the
sum of the abortion values by the number of
cases with valid information
• But let’s allow only two variables with missing
values
Dealing with Missing Data
Continued
• Click on “Reset” to erase what is currently in the
“Compute Variable” box
• Click on “Statistical” in the “Function Group” box
• Then double click on “Mean” in the “Function
and Special Variables” box
• In the “Target Variable” box, enter the name of
the new variable. Let’s call it ABORMEAN
• In the “Numeric Expression” box, you should see
“MEAN(?,?)”
Dealing with Missing Data
Continued
• Replace the “?,?” with the variables you
want to include so it reads “MEAN
(abany,abdefect,abhlth,abnomore,abpoor,
abrape,absingle)”
• Insert .5 following MEAN so it reads
“Mean.5”. This indicates that you want to
have at least five variables with valid
information
• Click on OK
Exercises for Compute
• There are five variables that measure
tolerance for letting someone speak in
your community who may have different
views than your own: SPKATH,
SPKCOM, SPKHOMO, SPKMIL, and
SPKRAC
• For each of these variables, 1 means they
would allow such a person to speak and 2
means they would not allow it
Exercises for Compute Continued
• Create a new variable (call it SPEAK)
which is the sum of these five variables
• Run a frequency distribution for SPEAK
• What do the values in this new variable tell
us?
More Exercises for Compute
• Now let’s create a variable called
SPKMEAN which allows for one of the five
variables (SPKATH to SPKRAC) to be
missing
• What happens if there is more than one
variable with a missing value?
• How does SPSS calculate the new
variable if there is only one variable with a
missing value?
Using Select Cases to Select
Specific Cases for Analysis
• Let’s select only Protestants for further analysis
• Click on “Data” and then on “Select Cases”
• Click on “If condition is satisfied” and then on the
“If” button below it
• Select the variable RELIG and move it into the
box on the right
• In this box, enter the expression “relig = 1”
• Click on “Continue” and on OK
Using Select Cases Continued
• Now lets select Protestants who are under
35 years age old
• Enter the expression “relig = 1” as you did
before.
• Use & for and. Enter “age < 35” so the
expression reads “relig = 1 & age < 35”
• Click on OK
Exercises for Select If
• Select all males (1 on the variable SEX)
and do a frequency distribution for the
variable FEAR (afraid to walk alone at
night in the neighborhood)
• Now select all females (2 on the variable
SEX) and fun a frequency distribution for
FEAR
• Are males or females more fearful of
walking alone at night?
More Exercises for Select If
• Now let’s select males under age 35 and
run a frequency distribution for FEAR
• Do the same thing for females under 35
• Are males or females under 35 more
fearful of walking alone at night?
Important Note on Using
Select Cases
• When you are finished using “Select
Cases” and want to revert to using all the
cases be sure to click on Data/Select
Cases and select “All cases”. Then click
on OK
• If you don’t do this, you will continue to
use only those cases you last selected
Univariate Analysis
• Now that we know how to open existing
files and transform variables, we’re ready
to begin analyzing data
• Univariate analysis refers to analyzing
variables one-at-a-time
Types of Univariate
Analysis Procedures
(see ch. 4 in text)
• Frequencies
• Descriptives
• Explore
Frequencies
• Go to Analyze/Descriptive
Statistics/Frequencies
• Select ABANY and AGE and click on OK
Bar Charts
• Bar charts – click on Analyze/Descriptive
Statistics/Frequencies
• Click on “Charts”
• Select “Bar Charts” and click on
“Continue” and then on OK
• Do you think bar charts are appropriate for
both ABANY and AGE?
Histograms
• Click on click on Analyze/Descriptive
Statistics/Frequencies
• Click on “Charts”
• Select “Histograms” and click on “Continue” and
then on OK
• Do you think histograms are appropriate for both
ABANY and AGE?
• Which do you think is the most appropriate chart
(bar chart or histogram) for ABANY and for
AGE?
Statistics
• Click on Analyze/Descriptive
Statistics/Frequencies
• Click on “Statistics”
• Select the statistics you want and click on
“Continue” and then on OK
Exercises for Frequencies
• There are seven variables dealing with
abortion: ABANY, ABDEFECT, ABHLTH
ABNOMORE, ABPOOR, ABRAPE, and
ABSINGLE
• Run a frequency distribution for each
variable
• Get a bar chart for each variable
• Compare and contrast how people
answered these seven questions
More Exercises for Frequencies
• Run the frequency distribution for AGE
• Get a histogram for AGE
• Compute the following statistics for AGE:
– Mean
– Median
– Standard deviation
– Percentiles – 25th, 50th, and 75th
Descriptives
• Click on Analyze/Descriptive
Statistics/Descriptives
• Select AGE and EDUC
• Click on “Options” and select the statistics
you want and then click on “Continue” and
OK
Exercises for Descriptives
• Use Descriptives to compute the following
statistics for AGE
– Mean
– Standard deviation
– Variance
– Skewness
– Kurtosis
More Exercises for Descriptives
• Use Descriptives to compute the mean for
EDUC, MAEDUC, PAEDUC
• Who has the most education –
respondents or their parents?
• Who has the most education – mothers or
fathers?
Explore
• Click on Analyze/Descriptive
Statistics/Explore
• Select EDUC and put it in the “Dependent
List”
• In the Display box on the lower left, click
on “Both”
• Click on OK
Selecting Statistics for Explore
• Click on Analyze/Descriptive
Statistics/Explore
• Click on “Statistics” and select the
statistics you want
• Click on “Continue” and then OK
Selecting Plots for Explore
• Click on “Plots”
• Select the plots you want
• Click on “Continue” and then OK
Exercises for Explore
• Using Explore to get the following statistics and
plots for the variables EDUC, PAEDUC, and
MAEDUC
–
–
–
–
–
Descriptives
Outliers
Stem-and-leaf plot
Histogram
Boxplot
• First select “Factor levels together” and run it
• Then select “Dependents together” and run it again
• What’s the difference?
Intermediate Workshop for SPSS
• In the next workshop we’ll look at different types
of statistical analysis you can do in SPSS
–
–
–
–
Cross tabulations (ch. 5)
Comparing means (ch. 6)
Correlation and regression (ch. 7)
Multivariate analysis (ch. 8)
• Cross tabulations
• Multiple regression
– Presenting your data – charts and tables (ch. 9)