Graphical Methods for Complex Surveys

Download Report

Transcript Graphical Methods for Complex Surveys

Definitions

• Observation unit • Target population • Sample • Sampled population • Sampling unit • Sampling frame

Target Population and Sampling Frame

Types of Surveys

Cross-sectional • surveys a specific population at a given point in time • will have one or more of the design components • stratification • clustering with multistage sampling • unequal probabilities of selection Longitudinal • surveys a specific population repeatedly over a period of time • panel • rotating samples

Cross Sectional Surveys

Sampling Design Terminology

Methods of Sample Selection

Basic methods • simple random sampling • systematic sampling • unequal probability sampling • stratified random sampling • cluster sampling • two-stage sampling

Simple Random Sampling

0 10 20 30 40 50 60 70 80 90 100 Why?

• basic building block of sampling • sample from a homogeneous group of units How?

• physically make draws at random of the units under study • computer selection methods: R, Stata

Systematic Sampling

0 10 20 30 40 50 60 70 80 90 100 Why?

• easy • can be very efficient depending on the structure of the population How?

• get a random start in the population • sample every k th unit for some chosen number k

Additional Note

Simplifying assumption: • in terms of estimation a systematic sample is often treated as a simple random sample Key assumption: • the order of the units is unrelated to the measurements taken on them

Unequal Probability Sampling

Why?

• may want to give greater or lesser weight to certain population units • two-stage sampling with probability proportional to size at the first stage and equal sample sizes at the second stage provides a self-weighting design (all units have the same chance of inclusion in the sample) How?

• with replacement • without replacement

With or Without Replacement?

• in practice sampling is usually done without replacement • the formula for the variance based on without replacement sampling is difficult to use • the formula for with replacement sampling at the first stage is often used as an approximation Assumption: the population size is large and the sample size is small – sampling fraction is less than 10%

Stratified Random Sampling

0 10 20 30 40 50 60 70 80 90 100 Why?

• for administrative convenience • to improve efficiency • estimates may be required for each stratum How?

• independent simple random samples are chosen within each stratum

Example: Survey of Youth in Custody • first U.S. survey of youths confined to long-term, state-operated institutions • complemented existing Children in Custody censuses. • companion survey to the Surveys of State Prisons • the data contain information on criminal histories, family situations, drug and alcohol use, and peer group activities • survey carried out in 1989 using stratified systematic sampling

SYC Design

strata • type (a) groups of smaller institutions • type (b) individual larger institutions sampling units • strata type (a) • first stage – institution by probability proportional to size of the institution • second stage – individual youths in custody • strata type (b) • individual youths in custody • individuals chosen by systematic random sampling

Cluster Sampling

0 10 20 30 40 50 60 70 80 90 Why?

• convenience and cost • the frame or list of population units may be defined only for the clusters and not the units How?

• take a simple random sample of clusters and measure all units in the cluster 100

Two-Stage Sampling

0 10 20 30 40 50 60 70 80 90 100 Why?

• cost and convenience • lack of a complete frame How?

• take either a simple random sample or an unequal probability sample of primary units and then within a primary take a simple random sample of secondary units

Synthesis to a Complex Design

Stratified two-stage cluster sampling Strata • geographical areas First stage units • smaller areas within the larger areas Second stage units • households Clusters • all individuals in the household

Why a Complex Design?

• better cover of the entire region of interest (stratification) • efficient for interviewing: less travel, less costly Problem: estimation and analysis are more complex

Ontario Health Survey

• carried out in 1990 • health status of the population was measured • data were collected relating to the risk factors associated with major causes of morbidity and mortality in Ontario • survey of 61,239 persons was carried out in a stratified two-stage cluster sample by Statistics Canada

OHS Sample Selection • strata: public health units – divided into rural and urban strata • first stage: enumeration areas defined by the 1986 Census of Canada and selected by pps • second stage: dwellings selected by SRS • cluster: all persons in the dwelling

Longitudinal Surveys

Sampling Design

Schematic Representation

Panel Survey

4 1 0 3 2

Respondents

Schematic Representation

Rotation Survey

4 1 0 3 2

Respondents

Survey Weights

Survey Weights: Definitions

initial weight • equal to the inverse of the inclusion probability of the unit final weight • initial weight adjusted for nonresponse, poststratification and/or benchmarking • interpreted as the number of units in the population that the sample unit represents

Interpretation • the survey weight for a particular sample unit is the number of units in the population that the unit represents

Interpretation

Not sampled , Wt = 2 , Wt = 5 , Wt = 6 , Wt = 7

Effect of the Weights • Example: age distribution, Survey of Youth in Custody Age 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Totals Counts 1 9 53 167 372 622 634 334 196 122 57 27 14 13 2621 Sum of Weights 28 149 764 2143 3933 5983 5189 2778 1763 1164 567 273 150 128 25012

Unweighted Histogram

Age Distribution of Youth in Custody

0.3

0.25

0.2

0.15

0.1

0.05

0 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Age

Weighted Histogram

Age Distribution of Youth in Custody

0.3

0.25

0.2

0.15

0.1

0.05

0 11 12 13 14 15 16 17 18

Age

19 20 21 22 23 24

Weighted versus Unweighted

Weighted and Unweighted Histograms

0.3

0.25

0.2

0.15

0.1

0.05

0 11 12 13 14 15 16 17 18

Age

19 20 21 22 23 24 Weighted Unweighted

Observations

• the histograms are similar but significantly different • the design probably utilized approximate proportional allocation • the distribution of ages in the unweighted case tends to be shifted to the right when compared to the weighted case • older ages are over-represented in the dataset