Document 7254498

Download Report

Transcript Document 7254498

SOCY3700
Selected Overheads
Prof. Backman
Spring 2008
Update history
Central Limit Theorem
If repeated random samples of size n
are drawn from any population with
mean μ and standard deviation δ,
the sampling distribution of sample
means will be normal as n gets
large, with mean μ and standard
deviation δ/√n (also known as the
standard error of the mean) .
Hence, the standard deviation of the
means drawn from many, many
samples reflects 1) the standard
deviation of the population, and 2)
the sample size
Probability Sampling
• Probability sampling is any
method of drawing a sample of
elements from a population
such that the probability that
any element or set of elements
will be included in the sample
is known and is not zero
• The chief advantage of
probability sampling is that
the accuracy (or lack thereof)
of estimates of population
parameters from the sample
can be estimated
Finite Populations and
Sampling
• Sampling error estimation depends
on the Central Limit Theorem
• The Central Limit Theorem applies
to infinite populations
– Infinite populations are easy to do in
theory, but rare in practice
• If you sample everyone in a finite
population, the sampling error
would be 0
– The closer you get to sampling
everyone, the smaller your error
should be
– Central Limit Theorem says error is
proportional to δ/√n
Finite Populations and
Sampling, cont.
• The finite population correction
factor (fpc) takes into account the
reduction in error you should get
from sampling all or a large fraction
of a finite population
• The fraction of the population that
is in the sample, n/N, is called the
sampling ratio (f)
• fpc = (N-n)/(N-1) ≈ (N-n)/N
= (1 – f)
• The standard error of the mean
from a finite population (with
simple random sampling) is
√fpc * (δ/√n)
• In practice, we ignore the fpc when
the sampling ratio is less than 10%
Simple Random Sampling
(SRS)
• Frame – complete list of the
survey population
• Sample size – calculated
based on desired precision of
results
• Selection rule – random
selection without replacement
• Estimate of population mean is
the sample mean
– Unbiased
– s.e. = √fpc * (δ / √sample size)
Simple Random Sampling:
Advantages and Disadvantages
• SRS advantages
– Samples are easy to draw
– Samples are easy to use
– Estimation of errors is “easy”
• SRS disadvantages
– Not always the lowest standard error
method
– Requires complete roster
– Can be very expensive
• Completing the frame may be expensive
• Reaching geographically dispersed
respondents may be expensive
– May require large sample sizes to deal
with rare population elements
• Most elements in the sample will not be
rare
Telephone Survey Sampling
Frames
There are four methods for sampling
phone numbers for general
population telephone surveys
1. Sample from the phone book
– Problems with unlisteds
2. Random digit dialing
– With known exchanges, use
random number generator to
come up with numbers
– Many non-working numbers
3. Plus-one, minus-one
– Sample from phone book, but add
or subtract 1 from the number
before dialing
4. Buy a list of known working
numbers
– Usually the best solution
Stratified Sampling
• Frame
– Usual SRS frame except broken
into exhaustive, mutually
exclusive groups
– Requires knowledge ahead of
time about how many elements
in the population there are in
each group
– Each group is a stratum (plural
strata)
• Sample size - calculated based
on desired precision of results
– Calculations more complex than
with SRS because there are
more alternatives
Stratified Sampling (2)
• Selection rules
– Cases are drawn from each
stratum
– Cases within strata are drawn by
SRS
– Two alternatives for number
drawn with each stratum
• Proportionate to size – every
element in the population has an
equal chance of being drawn into
the sample, regardless of stratum
• Disproportionate – some strata
will have a larger proportion of the
sample than they will of the
population
Stratified Sampling (3)
• Proportionate sampling is
technically known as
probability proportionate to
size selection, or PPS
• Disproportionate sampling is
non-PPS selection
• Disproportionate sampling can
be used to get enough “rare
elements” into a sample to
allow analysis of such
elements with a reasonable
level of confidence
Stratified Sampling (4)
• Estimation of the mean
– If proportionate to size selection is
used, the sample mean is an unbiased
estimate of the population mean
– If disproportionate selection is used,
weights must be used to obtain an
unbiased estimate of the population
mean
– Standard error of the mean will
ordinarily be lower than the standard
error from a simple random sample of
the same size
– The more homogeneous the elements
are within strata, the more efficient
stratified sampling will be
Stratified Sampling:
Advantages and Disadvantages
(compared with
Simple Random Sampling)
• Advantages
– Reduced standard errors of estimate
over SRS
– Can thus get the same precision as
SRS with smaller sample size
– If proportionate selection is used,
unweighted sample statistics can be
used to estimate population
parameters
– Disproportionate selection can be
used to get sufficient numbers of
members of rare populations
• Disadvantages
– Requires advanced knowledge about
stratum sizes
– Disproportionate selection requires
use of weights in making estimates of
parameters
Cluster Sampling
• Most complex method. Often used
in conjunction with stratification and
SRS; this is called multi-stage
sampling
• Frame
– Broken into groups called clusters
– Complete frame is needed only for
clusters that are selected
• It is necessary to know the size of clusters
that are not selected
• Sample size – usually calculated
based on explicit tradeoff between
costs and precision of results
– Calculations more complex than with
SRS or stratification because there are
more alternatives
Cluster Sampling (2)
• Selection rules
– A sample of the clusters is drawn
by simple random sampling
– Within each cluster either all the
elements or a simple random
sample of the elements are
drawn
– When possible, sample sizes
within clusters are drawn
proportionate to size
– NOTE that in cluster sampling
only some of the clusters are
used, while in stratified sampling,
all of the strata are
Cluster Sampling (3)
• Estimation of the mean
– If clusters and elements within clusters
were drawn so that all elements in the
population had equal probabilities of
selection, the sample mean is an
unbiased estimate of the population
mean. This rarely is possible
– In the likely case of unequal
probabilities of selection, weights must
be used to obtain an unbiased
estimate of the population mean
– Standard error of the mean will
ordinarily be higher than the standard
error from a simple random sample of
the same size
– The more heterogeneous the
elements are within strata, the more
efficient cluster sampling will be
• To the extent possible, each cluster should
be representative of the entire population
Cluster Sampling:
Advantages and Disadvantages
(compared with
Simple Random Sampling)
• Advantages
– Cost control
• In general, the only reason to use
clustering is to reduce financial or time
costs
– Can be used with stratification of
clusters to help control standard errors
– If proportionate selection is used,
unweighted sample statistics can be
used to estimate population
parameters
• Disadvantages
– Sampling consultant probably needed
– Larger standard errors than with SRS
– Parameter and error estimation
usually requires use of weights
Sample Pathologies
• Biggest, most common
problem: non-response
– Estimation of parameters and
errors assumes that data were
collected from every element in
the sample
• Limitations on generalizability
due to mismatch between the
population of interest (target
population) and the frame
(survey population)
– Called coverage error
Surveys and the
US Census
The Census has numerous benefits
for survey researchers
– The decennial census is mostly a mail
survey
• Therefore, the Census Bureau sponsors a
great deal of research on writing questions
and other aspects of mail surveys
– Census information is often used in
developing stratified and cluster
samples, where knowledge of
population counts is necessary
– Census information is often used to
evaluate how well a sample covers a
population
• Especially when there is nonresponse,
comparing demographic data from the
sample with demographic data from the
census can give some idea of who the
nonrespondents were
Sampling Review
• Rule of thumb sampling error of a
proportion at the 95 percent
confidence level =
1 / square root (sample size)
– If size = 400, error = 1/20 = 5%
• The Central Limit Theorem is
important for social science
research because it provides the
mathematical basis for using
probability samples 1) to make
estimates of parameters from large
populations using small samples
and 2) to estimate the precision of
those estimates
Sampling Review (2)
• In both stratified and cluster
sampling the survey population
is divided into exhaustive,
mutually exclusive groups.
Each group could be either a
stratum or a cluster
• If we use all the groups in our
final sample, we call each
group a stratum
• If we use only some of the
groups in our final sample, we
call each group a cluster
Dillman on the Survey
Process
• Dillman analyzes the survey
process from an exchange
theory perspective
– There is an exchange between
the researcher and the
respondent
– Compliance with researcher’s
request for information is a
function of the social rewards the
researcher can offer the
respondent
• Rewards such as gratitude,
opportunity to have a say on
something important
Surveys á la Dillman:
Eight Steps
1. Decide what information you
need
2. Choose a survey method
3. Draw a sample
4. Write questions
5. Design the questionnaire
6. Field the survey
7. Turn answers into usable
data
8. Report results
Source: Patricia Salant and Don A. Dillman. 1994.
How to Conduct Your Own Survey. NY: Wiley
Writing Survey Questions
• Question topics
– There is little you can’t ask about
– Useful distinction:
• Questions about subjective states
like attitudes, beliefs, and
knowledge
• Questions about objective
phenomena like behavior or
demographic attributes
– Always remembering that in a
questionnaire even objective
phenomena are filtered through the
respondent’s mind
Writing Survey Questions (2):
Question Form
• Two basic question forms:
open-ended and closed-ended
• Open-ended questions are
questions to which
respondents can give any
answer
• Closed-ended questions both
ask a question and provide the
respondent with preset
answers to the question to
choose among
Pp. 177ff in W.L. Neuman. 2007. Basics of Social
Research. 2nd ed. Boston: Pearson
Writing Survey Questions (3):
Closed-ended Questions
• Questions with ordered
categories
– E.g., Likert scale items
– When there is an order, be sure
to use it
• Questions with unordered
categories
• Partially closed-ended
– One option is something like
“Other (please specify) ____”
Writing Survey Questions:
Neuman’s
Dirty Dozen Don’ts
1. Avoid jargon, slang, and
abbreviations
2. Avoid ambiguity, confusion,
and vagueness
a. Whatever
3. Avoid emotional language
a. Can evoke frames that
effectively hijack the intent of
the question
4. Avoid prestige bias
Pp. 170-3 in W.L. Neuman. 2007. Basics of Social
Research. 2nd ed. Boston: Pearson
Writing Survey Questions:
Neuman’s
Dirty Dozen Don’ts (2)
5. Avoid double-barreled
questions
6. Do not confuse beliefs with
reality
7. Avoid leading questions
8. Avoid asking questions that
are beyond respondents’
capabilities
Pp. 170-3 in W.L. Neuman. 2007. Basics of Social
Research. 2nd ed. Boston: Pearson
Writing Survey Questions:
Neuman’s
Dirty Dozen Don’ts (3)
9. Avoid false premises
10. Avoid asking about intentions
in the distant future
11. Avoid double negatives
12. Avoid overlapping or
unbalanced response
categories
Pp. 170-3 in W.L. Neuman. 2007. Basics of Social
Research. 2nd ed. Boston: Pearson
Questionnaire Layout (1)
• Very important
– Reflects your professionalism in the
eyes or ears of your respondents and
the eyes of your interviewers
– Affects the likelihood of measurement
error through respondent or
interviewer error
– Affects response rate
• In mail surveys designed primarily
with respondent in mind
• In telephone and face-to-face
surveys, designed with both
interviewer and respondent in mind
Questionnaire Layout (2):
Mail Surveys
• Overall objectives
– Minimize perceived (and real)
respondent burden
– Don’t confuse respondent
– Simplify later data entry
• Make a booklet
– Questions are enclosed inside a
booklet made of folded legal
sized (8.5 x 14 inch) paper
– No questions on the front or
back of the booklet
Questionnaire Layout (3):
Mail Surveys
• Front page of booklet:
– Title of study
– Some graphic stuff
– Sponsor
– Return address
• Back page
– Request for comments
– Thank you
– Return address and telephone
contact information
Questionnaire Layout (4):
Mail Surveys
• Overall question sequence
– Start easy
• First question must grab attention,
reflect the issues in the cover letter,
and not be too difficult or
threatening
– Start on topic
– Group like questions together
• Makes writing transitions easier
– Keep threatening questions until
later in the questionnaire
– Get your demographics last
• That’s probably least important to
you and apparently least relevant to
respondent
Questionnaire Layout (5):
Mail Surveys
• Layout of individual pages
– Use white space
• What counts is not how many
pages the survey is, but rather how
long it seems to be to respondents
– Use fonts consistently to
distinguish questions, answers,
and instructions
• Dillman likes to use bold for
questions, all caps for answers,
unbolded for transitions, and
unbolded in parentheses for
instructions
– Establish a vertical flow
– Precode the answers, usually on
the left margin
Fielding Mail Surveys (1)
Overview
1. We’re always trying to
increase response rates
2. Respondents are most
likely to respond if they think
benefits outweigh their costs
3. We need to keep
respondents engaged from the
opening of the mail through the
returning of the completed
questionnaire
Source: Salant and Dillman
Fielding Mail Surveys (2)
Bottom lines
1. Mail survey response rates
depend very much on the
number of contacts
2. Mail surveys require
advanced planning
- Be sure you have the
resources to meet the schedule
3. What really matters is the
overall look and feel of the
questionnaire
- It’s a lot like buying (or selling!)
a car
Source: Salant and Dillman
Fielding Mail Surveys (3)
• First mailout – advanced
notice letter
– Sent to the entire sample
– Mailed first class
– Handwritten signature
– Explains why there will be a
survey
– Explains why participation will be
appreciated
• Put yourself on the mailing list
for this and all other mailings
Fielding Mail Surveys (4)
• Second mailout – cover letter,
questionnaire, and return
envelope
– Sent one week after advanced
notice
– Cover letter
• Personalized
• Explains survey purpose
• Explains ID# on the questionnaire
and promises confidentiality
• Reinforces importance of
everyone’s participation
• Specifies who should complete the
questionnaire
• Thanks respondent for participation
• Hand signed
Source: Salant and Dillman
Fielding Mail Surveys (5)
[Second mailout, cont]
• Questionnaire – with ID
number
• Return envelope is stamped,
addressed, and ready for use
Fielding Mail Surveys (6)
• Third mailout – postcard
followup
– 4 to 8 days later
– Personalized
– Reminding and thanking
• Fourth mailout – new cover
letter, questionnaire, and
return envelope
– Three weeks after the second
mailout (the first one with a copy
of the questionnaire)
– Sent only to addresses that have
not yet returned the survey
Fielding Mail Surveys (7)
• The four mailings should yield
a final response rate of 50 – 60
percent
• To further increase response
rate, one can:
– Send another follow up like the
fourth mailing
– Send the follow up as certified or
express mail
– Telephone
• Often you will discover that people
shouldn’t have been in the sample
in the first place
Experiments: Overview
• Experiments are particularly
important in microsociological
research, i.e., social
psychology
– It is difficult to have enough
control over the setting to do
macrosociological experiments
• We can do quasi-experiments
• We can observe natural
experiments
• In an experiment we create a
believable environment
• In experiments we try to control
“everything”
– Ceteris paribus – all other things
being equal
Levels of Involvement in
Observational Research
•
There are several types of
involvement of the researcher
and the people he is trying to
study
1. Outside observation – studying
a group entirely from the
outside, perhaps through
intensive interviews
2. Overt complete observation –
observing from within the group
and known by the group to be
there as an observer
Levels of Involvement in
Observational Research (2)
3.
4.

Participating observer -mostly an observer but also
participating in the activities of
the group
Observing participant -mostly a group member but
also observing the activities of
the group
Participating observer and
observing participant are
types of participant
observation
Street Corner Society:
The Social Structure of
an Italian Slum
William Foote Whyte,
1943 (third edition, 1981)
Whyte Bio
• Educated middle class upbringing
• Loved to write
• Attended Swarthmore in suburban
Philadelphia
• Engaged in some reform activities
in college, but engaged even more
in writing
• Wrote a novel, decided it was lousy
because he didn’t have enough to
say
• Got a Junior Fellowship at Harvard
– three years just to hang around
and do whatever research took his
fancy (sort of)
The Research Problem
• Whyte came to Harvard knowing
mainly that he wanted to study slums
and somehow improve the world
• Social scientific literature was just
beginning to appear. He read lots of it
• Other folks at Harvard had done
similar work and were developing
some theoretical ideas about group
process
– One would not think one would go to a
slum to study group process, but in the
end that was a big part of what Whyte did
• Many of the ideas Whyte when he
started his work came to naught
– “We set out on the frontiers of our
personal knowledge and began exploring
beyond those frontiers” (Whyte 1984:63)
“Cornerville”
• In the usual fashion, Whyte gave
his city and neighborhood a
psuedonym. Cornerville refers to
the slum, now known to be
Boston’s North End. He called
Boston “Eastern City.”
• At the time (around 1937)
Cornerville was suffering the effects
of The Great Depression
• Predominately Italian in a city
whose big politicians were mostly
Irish
• Many residents spoke only Italian
Getting In
• Wandered around Boston,
settled on Cornerville because
it “looked like” his vision of a
slum
• Could observe from the
outside, but wanted to observe
from the inside
• After various failed schemes,
introduced to Doc by the social
worker in charge of girls’
programs at the local
settlement house
• Moved into the neighborhood
Doc
• Doc (a psuedonym for Ernest
Pecci) is probably the most
famous informant in sociology
– A pretty good sociologist himself
for someone who never had a
sociology course
• Late 20s, mostly unemployed
guy from the neighborhood
• Informal leader of a group of
similarly underemployed age
mates
• Interested in making things
better
Doc and Bill
• Doc’s famous response to Whyte’s
first rambling description of what
Whyte was trying to do in
Cornerville:
“Do you want to see the high life
or the low life?”
• Doc served as Whyte’s sponsor,
guide, and “member validator”
– Having a sponsor can be a problem in
settings with a great deal of conflict, as
you may be seen as being on your
sponsor’s side
– “Member validator”: insider who
reviews the sociologist’s analysis from
an insider’s point of view
Getting On
• Whyte moved into Cornerville,
taking a room with a family
• Whyte tried to learn Italian
– Though never got proficient, he
felt his efforts gave him a great
deal of credibility, especially with
the older generation
• Joined various clubs,
becoming secretary of at least
one
• Hung out with Doc’s gang
• Returned regularly to Harvard
for baths and brainstorming
with other social scientists
Going Native
• When you start to act like and
especially to think like the people
you are studying, you have gone
native
– Quite common occurrence
– It is difficult to completely go native
• Whyte’s efforts to swear like the
other guys weren’t successful,
partly because they wanted him to
be himself
– Can get you in trouble
• Whyte voted illegally
• Whyte almost inadvertently got
engaged because he didn’t
understand as much of native
practice as he thought
– The natives aren’t always grateful
Street Corner Society:
Sources
Whyte, William F. [1943] 1981.
Street Corner Society. 3rd ed.
Chicago, IL: University of
Chicago Press.
Whyte, William F. 1984. Learning
From the Field: A Guide from
Experience. Newbury Park,
CA: Sage.
Whyte, William F. nd. Various
personal and classroom
communications.
Bernard on
Unstructured Interviews
• H. Russell Bernard – cultural
anthropologist from U of Florida,
author of a research methods text I
have used in advanced research
methods courses
– As surveys are to sociologists, so
unstructured (and semi-structured)
interviews are to cultural
anthropologists
– As a researcher, journal editor, and
methods text author, Bernard has
been given credit for strengthening the
rigor of anthropological research
Source: Bernard, H. Russell. 1995. Research
Methods in Anthropology: Qualitative and
Quantitative Approaches. 2nd ed. Walnut Creek,
CA: AltaMira. Mostly Chapter 10, pp. 208-36.
Bernard on Unstructured
Interviews (2):
Continuum of Interview
Situations
Since the researcher is an outsider,
the locals will generally be aware
that any contact is likely to involve
information gathering
• Continuum of situations based on
how much the interviewer
controls the situation
1. Informal interview – more or less
normal conversation
-
Typical early in research
Useful for rapport
Useful later for finding topics that might
have been overlooked
Bernard on Unstructured
Interviews (3):
Continuum of Interview
Situations (2)
2. Unstructured interview – not just
normal conversation, but with
minimal control over the responses
of the interviewee
3. Semi-structured – like unstructured
but with an interview guide
-
-
Interview guide: written list of topics,
probes, etc. intended to be covered in
the interview
More formal than unstructured
4. Structured – questions (and often
answer choices) established ahead
of time by the interviewer
- For example, standard survey interviews,
self-administered questionnaires
Bernard on Unstructured
Interviews (4):
Starting the Interview
• Assure anonymity
• Explain their importance to
your understanding
• Ask for permission to record
the interview and to take notes
– The value of the interview much
lower if you can’t record or take
notes
– Even with recorder it helps to
take occasional notes
Bernard on Unstructured
Interviews (5):
Let the Informant Lead
Rule # 1: get an informant on
the topic and get out of the
way
– You pick the topic, interviewee
provides the content
– In general, it is the interviewee’s
ideas you are interested in, not
yours
• This rule is not always slavishly
followed
– Interviewee may stray off topic
– You may have ideas you want
responded to
Bernard on Unstructured
Interviews (6):
Probes
• Use probes to guide interview
• Probe (Bernard definition):
stimulating an informant to give
more information without injecting
yourself so much into the
interaction that you get only a
reflection of yourself in the data
– There are many types of probes
– Our textbook definition: a neutral
request to clarify an ambiguous
answer, to complete an incomplete
answer, or to obtain a relevant
response (p. 192 in Neuman 2007)
Bernard on Unstructured
Interviews (6):
Types of Probes 1
• Silent probe – don’t say
anything when the interviewee
stops
– Difficult to do appropriately
– Culturally sensitive since
different cultures have different
rules about silence
• Echo probe – repeat the last
thing the interviewee said
– Signals that you are interested in
what was said without saying
why or suggesting what to say
Bernard on Unstructured
Interviews (7):
Types of Probes 2
• Uh-huh (neutral) probe –
make regular affirmative
noises, as one often does in
normal conversation to indicate
you are still listening and are
interested
– Keeps the interviewee talking
Rule #2: In general, more
talking by the respondent is
better
– Hence, longer responses are
better
Bernard on Unstructured
Interviews (8):
Types of Probes 3
• The long question probe –
instead of keeping a question
short and to the point, asking a
long roundabout question
– You’re modeling the kind of long
answer you want to get back
– The trick is not to guide the
answer as you ask the question
Bernard on Unstructured
Interviews (9):
Types of Probes 4
• Probe by leading – ask a
leading question as a way of
focusing provoking the
interviewee
– Usually we try not to lead, but
sometimes respondents seem to
be avoiding a topic or conclusion
– Can be used to ask about more
specific incidents or about what
happens when things don’t work
out as expected
– Often based on earlier interviews
Bernard on Unstructured
Interviews (10):
Types of Probes 5
• Phased assertion (baiting)
probe – you take some
information that may or may
not be true and ask questions
as if it were true
– For example, “I guess Hilary and
Barak are friends again. I
wonder why.”
– This is a favorite ploy of gossipmongers
Bernard on Unstructured
Interviews (11):
Verbal Respondents;
Equipment
• Verbal respondents – don’t
be afraid to interrupt a long
winded respondent who is
wandering away from your
topic. Try to be graceful about
it
• Equipment – always make
sure that your tape recorder is
ready before the interview
(fresh tapes and batteries)
Bernard on Unstructured
Interviews (12):
Uses of Unstructured
Interviews
• A primary source of raw data
• Preparation for semi-structured
interviews
• To get info from people unlikely
to give more formal interviews
• Developing rapport
• Studying sensitive topics
– E.g., hot political topics,
sexuality, racial prejudice
– Conflict: you can get wide range
of information from multiple
interviewees
Bivariate
Relationships with
Integer-level Variables
Preliminaries to multiple
regression
Steps in Analysis of
Bivariate Relationships
Between
Integer-level Variables
• Look at scatterplot
– Dependent variable as the Y
(vertical) axis
– Independent variable as the X
(horizontal) axis
• Make best-fit line
– Since it is a line, we call it linear
regression
– Since we have only one
independent variable, we call it
simple linear regression
• Calculate slope (b)
• Calculate goodness of fit (r)
Interpretation of Simple
Regression Results
Equation:
Dependent
= intercept +
coefficient * independent
+ error
• Coefficient (aka b, beta, or
regression coefficient) tells
how many units of the
dependent variable go with the
increase of one unit on the
independent variable
– Mathematically, the slope
Interpretation of Simple
Regression Results (2)
• Correlation coefficient (aka r,
Pearson’s r) – a measure of how
well the line fits the data, usually
interpreted as how strong the
relationship is
– Measures the “goodness of fit”
• The higher the absolute value of r, the
better the fit
– Ranges between -1 and 1
• Positive coefficient means there is a
positive relationship between the two
variables (high on the independent goes
with high on the dependent)
• Negative coefficient means there is a
negative relationship between the two
variables (high on the independent goes
with low on the dependent)
Interpretation of Simple
Regression Results (3)
• Intercept – how many units of
the dependent variable you
would be expected to have
with 0 units of the independent
– Mathematically, it is where the
line crosses the vertical axis
• Error – the difference between
what was actually measured
for the dependent variable for a
particular case and the
measurement predicted by the
equation for the line
Interpretation of Simple
Regression Results (4)
• Statistical significance –
tests how sure we are that the
regression coefficient is not
zero OR that the correlation
coefficient is not zero
– Conventionally we use the 95
percent confidence level
– At the 95 percent confidence
level, the probability of a false
positive is less than 5 percent,
usually written as p<.05
Interpretation of Simple
Regression Results (5)
Example
Dependent variable: violent crimes per
100,000 population
Independent variable: percent of population
15 and up who are currently divorced
Correlation coefficient = 0.24
There is a positive relationship
Regression coefficient = 38.6
For every additional 1 percent to the
percent divorced of the population 15+
there is an increase in the violent crime
rate of 39
Intercept = 160
If no one in the population were divorced,
there would be 160 violent crimes per
100,000
The relationship is significant at the p<.048
level
Multiple Regression
• Multiple regression is multiple
because it allows the use of
more than one independent
variable
– This is nice since so much of
social life has multiple causes
• Multiple regression is probably
the most important statistical
tool in use in sociology today
• There are many similarities
between simple regression and
multiple regression
Multiple Regression (2):
Similarities with Simple
Regression
• The key mathematical operation is
fitting a line to the data points
– The method is the same: choose the
line that minimizes the squared
distances between the points and the
line
• Called the method of least squares;
the line is sometimes called the
least squares line. Sometimes it is
called the ordinary least squares
(OLS) line
• There is a statistic for the overall fit
of the line to the data points
• Each independent variable gets its
own regression coefficient
Multiple Regression (3):
Differences from Simple
Regression
• Scatterplots are in hyperspace
– That is, for each variable, including the
dependent, there is another dimension
in the graph
• They’re really hard to draw!
• The goodness of fit statistic doesn’t
tell you the direction of the
relationships
– We use R (not r) as its symbol
– Actually, we usually use R2
– R2 tells us the proportion of variation in
the dependent variable that is
accounted for by the independent
variables
Multiple Regression (4):
Interpretation of
Regression Coefficients
• New term: ceteris paribus – all
other things being equal
• A regression coefficient tells us
how much change in the
dependent variable is
associated with a change of
one unit in the coefficient’s
independent variable, ceteris
paribus
Multiple Regression (5):
The Regression Equation
• Multiple regression is based on
the matrix equation
Y = XB + e
where Y is the dependent
variable, X is the matrix of
dependent variables, B is a
vector of regression
coefficients (and the intercept),
and e is the error
Multiple Regression (6):
Varieties of Multiple
Regression
• Ordinary regression makes certain
assumptions about the relations
between the independent variables
and about the errors
– These assumptions are not always
met
• Ordinary regression is limited to
only one dependent variable
• There are a large number of
modifications to ordinary regression
that overcome some of its
limitations and to loosen the
assumptions
Multiple Regression (7):
The General Linear Model
• The collection of modifications and
extensions to ordinary regression is
called the general linear model
– The GLM is based on the equation
given earlier
– It brings together a wide range of
statistical methods, some of which had
been invented independently
• The GLM is a conceptual and
methodological breakthrough
paralleled in its importance for
quantitative social science only by
the discovery of sampling theory
Steps in the Research
Process
•
Neuman’s Seven Steps
1.
2.
3.
4.
5.
6.
7.
•
Select topic
Focus question
Design study
Collect data
Analyze data
Interpret data
Inform others
Adapted from pp. 9-10 in
Neuman (the textbook)
Steps in the Research Process:
1. Select Topic
Steps in the Research Process:
2. Focus the question
Steps in the Research Process:
3. Design the study
1. Decide what you want to
know
2. Decide how to measure
concepts you’re working with
3. Decide what information you
will need
4. Decide which method of data
collection to use
-
Using information from this
class
5. Determine what kind of
support you will need
Steps in the Research Process:
3. Design the study:
Types of Support
• There are two kinds of support a
researcher may need:
– money
– permissions
• Money (funding) is absolutely
necessary for some research
– If you are a consultant for hire, money
is why you are doing the research
– Grant writing is its own art
• It can be its own profession
• It is a fact of life for many social workers
since grant money is the lifeblood of many
social service organizations
• Writing grant proposals is a common
assignment in courses like SOCY3700
Steps in the Research Process:
3. Design the study:
Permissions (1)
• If you are going to do research
directly with live people, you
will probably need to obtain
some kind of permission
• Human subjects – research at
a university working directly
with people must usually be
approved by the university’s
institutional research board
(IRB)
– To ensure the research meets
ethical standards
Steps in the Research Process:
3. Design the study:
Permissions (2)
• Informed consent
– Contemporary ethical standards
require that research subjects be
made aware of the nature of the
research and be given the
opportunity to choose not to
participate
• Particularly if the research process
may directly affect them and/or may
pose some level of risk
• Children, the mentally ill, and
others may not be able to give
informed consent; it must be
obtained from guardians
Steps in the Research Process:
3. Design the study:
Permissions (3)
• Access – to observe the people or
things you are interested in, you
may have to go through
intermediaries
– Access to subjects: studying students
can require working with school
districts; studying workers can require
going through management
– Access to sites: making observations
in “owned” natural settings like bars or
even public parks can require making
arrangements with owners or their
staff
Steps in the Research Process:
4. Collect data
Steps in the Research Process:
5. Analyze the data
Steps in the Research Process:
6. Interpret the data
Steps in the Research Process:
7. Inform others: sponsored
research
• For sponsored research,
present findings to client
– Perhaps as a presentation
– More often as a written report
• Reports generally include an
executive summary – a brief
summary of the findings
• The executive summary may be the
only thing the client actually reads
Steps in the Research Process:
7. Inform others: scholarly
research
• Scholarly research usually
appears in one of three places:
– scholarly journals
– books
– conference presentations
• Journals are the most
important outlet
– Having your work published in a
journal can be very difficult
– Most important journals in
sociology reject more papers
than they publish
Method of Hypothesis
and Deduction
Deduction and Induction
• In deduction we use logical
processes to develop
hypotheses from existing
theory
• In induction we develop
hypotheses (and theory) on the
basis of observations
False Positives and
False Negatives
• False Positive: when something
you say is true is actually false
– Examples: an innocent man is found
guilty. The hurricane you predicted
does not happen. Medicine that you
think will help doesn’t.
• False Negative: when something
you say is false is actually true
– Examples: a guilty woman is found not
guilty. The hurricane that you said
would not happen, happens. You
decide that medicine that is actually
helping you is not necessary.
False Positives and
False Negatives (2)
• If we always knew the truth, then
there would be no false negatives
or false positives.
• When there is uncertainty, any time
we say yes or no, we raise the
possibility of a false positive or
false negative. WHICH IS
WORSE?
• The conventions of science say
that false positives are the more
important to avoid
– That is, we’re more worried about
claiming something is true when it isn’t
than about saying that something is
false when it is actually true
Requirements for
Establishing Causality
1) Time order: the cause must
come before the effect
2) Correlation: the cause and effect
must be correlated. That is,
changes in the value of the cause
must generally be accompanied
by changes in the value of the
effect
3) Non-spuriousness: there must
not be some other factor that
explains the correlation. If there
is, then the original relationship is
spurious
Operationalization
• Operationalization – the process
of moving from the conceptual
definition of a construct [theoretical
idea] to a set of specific activities or
measures that allow a researcher
to observe it empirically (i.e., its
operational definition) [Neuman 2nd, p. 370]
• Operational definition – the
definition of a variable in terms of
the specific activities to measure or
indicate it with empirical evidence
[Neuman 2nd, p. 370]
• Operationalization is a key link
in the connection between
theory and observation
Measurement Reliability
Measurement Validity
• Measurement validity is the
extent to which a measure
measures whatever it is
intended to measure
• Three types of measurement
validity
– Face validity – does the
measure seem (“on its face”) like
it measures what it’s supposed
to (often tested by asking experts
and others)
Measurement Validity, cont.
– Content validity - the extent to
which the measure covers the
full range of the concept
• The richer the concept (say,
religiosity or feminism), the more
likely that multiple indicators will be
needed
– Criterion validity – the extent to
which the measure is supported
by other accepted measures
• Concurrent validity – how well the
measure correlates with other
measures of the concept
• Predictive validity – how well the
measure correlates with other
concepts its should be related to
Levels of Measurement
• Nominal – values identify
categories only
– Do not have arithmetic meaning
– Also called categorical variables
– When there are only two
categories, called dichotomies
or binary variables
– Two technical requirements
for categories:
• Exhaustive ( every observation fits
into some category)
– Leads to lots of “Others”
• Mutually exclusive (every
observation fits in exactly one
category)
Levels of Measurement, cont.
• Ordinal – same characteristics as
nominal PLUS the fact that
categories can be ranked from
lower to higher
– Mathematical operation of subtraction
makes no sense, but > and < do
– Most common: Likert
• Interval – same characteristics as
ordinal PLUS the fact that the
arithmetic difference between any
two values makes sense
– That is, the usual subtraction
operation makes the usual arithmetic
sense
• Ratio – same characteristics as
interval PLUS the fact that there is
a sensible zero value
– Thus division and ratios make sense
Abbreviations often used
for “Other” categories
• NA – no answer or not
answered
• DK – don’t know
• NAP – not applicable. Often
this means the question was
not even asked
• nec or n.e.c.– not elsewhere
classified. Typically in the
category title, “Other, nec”
Scales and
Indexes (Indices)
Ecological and
Reductionist Fallacies
• Unit of analysis – level (individual
or some kind of aggregate)
addressed by your theory or
hypothesis
• Unit of observation – level
(individual or some kind of
aggregate) from which data are
collected
• Ecological fallacy – drawing
conclusions about individuals
based on data from aggregates
• Reductionist fallacy – drawing
conclusions about aggregates
based on data from individuals
Writing About Crosstabulations
From a Sample
• Lead with what is important
– What’s important?
• The fate of your hypotheses (if you
have stated some)
• The overall pattern for the
dependent variable, especially if it
is striking or surprising. Then look
at deviations from the pattern in the
categories of your independent
variable
• Big differences between categories
of your independent variable
• Things of interest to your audience
– Remember, the usual point of a
crosstab is to display differences
between categories of the
independent variable
Writing About Crosstabulations
From a Sample, cont.
• Do not use raw counts; use
percents
• Use the correct percents
– Do not confuse row, column, and total
percents
• Be sure to specify the base for
percents
– Usually something like, “… x percent
of [the base] …” or “Of all [bases]
surveyed, x percent responded…”
• Round percents in your text (but
not necessarily in your tables) to
integers
• Be ready to convert percents to
simple fractions
– For example, 23 percent could be
called “nearly a quarter” or “about one
in four”
Writing About Crosstabulations
From a Sample, cont.
• Do not confuse percentage
differences and percentage
point differences
– Percentage differences cannot
be calculated by simple
subtraction
• Be ready to collapse
categories
– For example, to combine
“Strongly agree” and “Agree”
responses into one category
• Be ready to calculate
cumulative percents