No Slide Title

Download Report

Transcript No Slide Title

Secondary Data Analysis
Linda K. Owens, PhD
Assistant Director for Sampling
and Analysis
Survey Research Laboratory
University of Illinois
What is secondary data?
• Data collected by a person or
organization other than the users
of the data
Survey Research Laboratory
2 of 27
Advantages of Secondary Data
• Unobtrusive
• Fast & inexpensive
• Avoid data collection problems
• Provide bases for comparison
Survey Research Laboratory
3 of 27
Disadvantages of Secondary Data
• Data availability
• Level of observation
• Quality of documentation
• Data quality control
• Outdated data
Survey Research Laboratory
4 of 27
Data Sources
 Inter-university Consortium for Political and
Social Research (ICPSR)
http://www.icpsr.umich.edu/index-medium.html
 National Center for Health Statistics (NCHS)
http://www.cdc.gov/nchs/default.htm
 Center for Medicare and Medicaid Services
(CMS) http://cms.hhs.gov/researchers/
 US Census Bureau
http://www.census.gov/main/www/access.html
Survey Research Laboratory
5 of 27
Data Sources (cont.)
Examples of Directly Downloadable Data from NCHS:
National Health and Nutrition Examination Survey (NHANES)
National Ambulatory Medical Care Survey (NAMCS)
National Hospital Ambulatory Medical Care Survey (NHAMCS)
National Hospital Discharge Survey (NHDS)
National Home and Hospice Care Survey (NHHCS)
National Nursing Home Survey (NNHS)
National Survey of Ambulatory Surgery (NSAS)
National Employer Health Insurance Survey (NEHIS)
National Vital Statistics System (NVSS)
National Health Interview Survey (NHIS)
Survey Research Laboratory
6 of 27
Survey Documentation & Analysis
Web-based analysis and documentation
•
•
•
•
http://sda.berkeley.edu/
http://www.icpsr.umich.edu/access/sda.html
http://www.icpsr.umich.edu/NACJD/das.html
http://www.icpsr.umich.edu/SAMHDA/
Survey Research Laboratory
7 of 27
Data Sources (cont.)
Data Available for Use with Survey Documentation and Analysis
(SDA):
Aging Data
• Longitudinal Study of Aging, 70 Years and Older, 1984-1990
• National Survey of Self-Care and Aging: Follow-Up, 1994
• National Health and Nutrition Examination Survey II: Mortality Study, 1992
• National Hospital Discharge Survey, 1994-1997
• National Health Interview Survey, 1994, Second Supplement on Aging
Criminal Justice Data
• International Crime Data
• Homicide Data
• National Crime Victimization Survey Data
• Corrections Data
Survey Research Laboratory
8 of 27
Data Sources (cont.)
Data Available for Use with Survey Documentation and
Analysis (continued):
Substance Abuse Data
• Drug Abuse Warning Network
• Monitoring the Future
• National Household Survey on Drug Abuse
• National Pregnancy and Health Survey
• National Treatment Improvement Evaluation Study
• Treatment Episode Data Set
• Uniform Facility Data Set
• Washington, DC Metropolitan Area Drug Study (DC*MADS)
Survey Research Laboratory
9 of 27
Evaluation of Data Sources
•
•
•
•
•
Purpose of the study
Sponsor/collector of the data
Mode of data collection
Sampling procedures
Consistency of data with other
sources
Survey Research Laboratory
10 of 27
Evaluation of Data Sources (cont.)
•
•
•
•
•
Documentation
Number of observations
Number of variables
Coding scheme
Summary statistics
Survey Research Laboratory
11 of 27
Types of Survey Sample Design
• Simple Random Sampling
• Systematic Sampling
• Complex sample designs
▪ stratified designs
▪ cluster designs
▪ mixed mode designs
Survey Research Laboratory
12 of 27
Types of Survey Sample Design
• Simple Random Sampling
 Each member of the population has an equal
and known chance of being selected
 Simple Random Sample With Replacement
(SRSWR)
 Simple Random Sample Without
Replacement (SRSWOR)
Survey Research Laboratory
13 of 27
Types of Survey Sample Design
• Systematic Random Sampling
 the selection of every kth element from a
sampling frame with the sampling interval k
(=N/n).
Survey Research Laboratory
14 of 27
Types of Survey Sample Design
• Stratified sample
 The population is first divided into nonoverlapping subpopulations: strata such as
gender, race or SES.
 Sample from each stratum.
 Proportionate vs. disproportionate
 Works most effectively when the variance of
the dependent variable is smaller within the
stratum than in the sample as a whole.
Survey Research Laboratory
15 of 27
Types of Survey Sample Design
• Cluster sample
 Elements are selected in groups or clusters
 PSU: Primary Sampling Unit. This is the first
unit that is sampled in the design. For
example, school districts from Chicago may
be sampled and then schools within districts
may be sampled.
 Homogeneity within cluster: Intracluster
correlation (ICC)
Survey Research Laboratory
16 of 27
Why complex survey design?
• Increased efficiency
• Decreased costs
• Sometimes the only option
available
Survey Research Laboratory
17 of 27
Complex Survey Design
• Complex designs with clustering
and unequal selection
probabilities generally increase
the sampling variance.
• Not accounting for the impact of
complex sample design can lead
to Type I error.
Survey Research Laboratory
18 of 27
Sample Weights
• “pweight” or selection weight: Used
to adjust for differing probabilities of
selection (=N/n).
• In theory, simple random samples
are self-weighted
• In practice, simple random samples
are likely to also require
adjustments for non-response
Survey Research Laboratory
19 of 27
Types of Sample Weights
Post-stratification weights:
• Typically used to adjust for minor differences in
nonresponse by demographic subgroup.
• Bring the sample proportions in demographic
subgroups into agreement with the population
proportion in the subgroups.
• Requires auxiliary dataset to use as a comparison.
• Not a fix for bad sample design
Survey Research Laboratory
20 of 27
Post-Stratification Weights Example
Sample Population
Percent
Percent
Weight
Male
42%
49%
1.16
Female
58%
51%
.879
Survey Research Laboratory
21 of 27
Types of Sample Weights (cont.)
Non-response weights:
• Designed to inflate the weights of
survey respondents to compensate for
nonrespondents with similar
characteristics.
• Only useful if nonresponse varies by
stratum (unless inflating sample size to
population size).
Survey Research Laboratory
22 of 27
Types of Sample Weights (cont.)
“Blow-up” (expansion) weights:
• Weights sum to population total
• Provide estimates for the total
population of interest
Survey Research Laboratory
23 of 27
Types of Sample Weights (cont.)
Replicate weights:
• A series of weight variables that are
used instead of PSUs and strata in an
effort to protect the respondents'
identity. Pweight and the replicate
weights must be used for the correct
calculation of the point estimate and its
standard error.
Survey Research Laboratory
24 of 27
Summary of Weights
•
•
•
•
Weight for probability of selection
Adjust for non-response
Post-stratify
Expand or contract to population/sample totals
Survey Research Laboratory
25 of 27
Syntax Examples of Design-based
Analysis in STATA, SUDAAN & SAS
svyset
svyset
svyset
svyreg
STATA
strata strata
psu psu
pweight finalwt
fatitk age male black hispanic
SUDAAN
proc regress data=”c:\nhanes.sav” filetype=spss
desgn=wr;
nest strata psu;
weight finalwt
subpgroup sex race;
levels
2
3;
model fatintk = age sex race;
Survey Research Laboratory
26 of 27
Syntax Examples of Design-based
Analysis in STATA, SUDAAN & SAS
SAS
proc surveyreg data=nhanes;
strata strata;
cluster psu;
class sex race;
model fatintk = age sex race;
weight finalwt;
Survey Research Laboratory
27 of 27