Segmentation Report

Transcript Segmentation Report

PROC SURVEYCORR
Jessica Hampton
CCSU, New Britain, CT
September 2013
Introduction
Medical Expenditures Panel Survey (MEPS)
•
•
•
•
•
•
•
•
•
Administered annually by the U.S. Department of Health and Human Services
since 1996
Agency for Healthcare Research and Quality (ARHQ)
Anonymity protected by removing individual identifiers from the public data files
MEPS 2010 consolidated data file released September 2012
Multiple components (household, insurance/employer, and medical provider).
Household component (1,911 variables) covers the following topics:
• Demographics
• Household income
• Employment
• Diagnosed health conditions
• Additional health status issues
• Medical expenditures and utilization
• Satisfaction with and access to care
• Insurance coverage
18,692 after excluding out of scope, negative person weights, under 18 and 65+
U.S. civilian, noninstitutionalized population
~3% out of scope (birth/adoption, death, incarceration, living abroad)
3
MEPS Survey Design Methods
•
•
•
•
MEPS is a representative but NOT a random sample of the population
Person weights must be used to produce reliable population estimates
Stratification:
• By demographic variables such as age, race, sex, income, etc.
• Goal is to maximize homogeneity within and heterogeneity between strata
• Sometimes used to oversample certain groups under-represented in the
general population or with interesting characteristics relevant to study
• For example: blacks, Hispanics, and low-income households
Clustering:
• By geography in order to reduce survey costs -- not feasible or costeffective to do a random sample of the entire population of the U.S.
• Within-cluster correlation underestimates variance/error -- two families in
the same neighborhood are more likely to be similar demographically (for
example, similar income)
• Desire clusters spatially close for cost effectiveness but as heterogeneous
within as possible for reasonable variance.
• Multi-stage clustering used in MEPS:
• sample of counties >> sample of blocks >> individuals/households
surveyed from block sample
4
Survey Design Considerations
•
•
•
•
•
If person weights are ignored and one tries to generalize sample findings to the
entire population, total numbers, percentages, or means are inflated for the
groups that are oversampled and underestimated for others
In regression analysis, ignoring person weights leads to biased coefficient
estimates
If sampling strata and cluster variables are ignored, means and coefficient
estimates are unaffected, but standard error (or population variance) may be
underestimated; that is, the reliability of an estimate may be overestimated
Or when comparing one estimated population mean to another, the difference
may appear to be statistically significant when it is not
(Machlin, S., Yu, W., & Zodet, M., 2005)
5
SAS Survey Procedures
SAS Survey Procedures
•
•
•
•
•
•
•
•
Intended for use with sample designs that may include unequal person weights,
clustering, and stratification.
PROC SURVEYMEANS estimates population totals, percentages, and means.
Includes estimated variance, confidence intervals, and descriptive statistics.
PROC SURVEYFREQ produces frequency tables, population estimates,
percentages, and standard error.
PROC SURVEYREG estimates regression coefficients by generalized least
squares.
PROC SURVEYLOGISTIC fits logistic regression models for discrete response
(categorical) survey data by maximum likelihood.
PROC SURVEYMEANS and PROC SURVEYREG available starting with SAS
version 8.
PROC SURVEYFREQ and PROC SURVEYLOGISTIC available starting with
version 9.
PROC SURVEYSELECT for sampling which will not be used in this project
7
PROC SURVEYMEANS Syntax
PROC SURVEYMEANS DATA=PQI.MEPS_2010;
STRATA VARSTR;
CLUSTER VARPSU;
WEIGHT PERWT10F;
DOMAIN INSCOV10;
VAR TOTEXP10 TOTSLF10;
RUN;
8
PROC SURVEYMEANS Output
9
PROC SURVEYFREQ Syntax
PROC SURVEYFREQ DATA=PQI.MEPS_2010;
STRATA VARSTR;
CLUSTER VARPSU;
WEIGHT PERWT10F;
TABLES PRIEU10 PRING10 INSCOV10;
RUN;
10
PROC SURVEYFREQ Output
11
PROC SURVEYREG Syntax
PROC SURVEYREG DATA=PQI.MEPS_2010;
STRATA VARSTR;
CLUSTER VARPSU;
WEIGHT PERWT10F;
MODEL &TARGET=&&VAR&I /SOLUTION;
ODS OUTPUT
PARAMETERESTIMATES=PARAMETER_EST
FITSTATISTICS=FIT;
RUN;
12
PROC SURVEYLOGISTIC Syntax
PROC SURVEYLOGISTIC
DATA=SASUSER.MEPS_2010;
STRATA VARSTR;
CLUSTER VARPSU;
WEIGHT PERWT10F;
MODEL TOTEXP_HIGH(EVENT='1')=AGE10X
MARRIED--HISPANX POVLEV10--PHYACT53
OBESE--ADSMOK42 ADINSA42--LOCATN_ER;
ODS OUTPUT
PARAMETERESTIMATES=WORK.PARAM;
RUN;
13
PROC SURVEYLOGISTIC/REG Output
Default output (similar to PROC LOGISTIC and PROC REG):
•
•
•
•
•
•
•
fit statistics (AIC, Schwartz’s criterion, R-square)
chi-squared tests of the global null hypothesis
degrees of freedom
coefficient estimates
standard error of coefficient estimates and p-values
odds ratio point estimates
95% Wald confidence intervals
Does not include:
•
•
Option for stepwise selection
chi-squared test of residuals/tabled residuals (assumptions of normality and
equal variance do not apply)
• influential obs/outliers (person weights)
14
PROC SURVEYCORR
Correlations
•
•
•
•
•
•
•
Three approaches
Unweighted PROC CORR
PROC CORR with person weights
“PROC SURVEYCORR” macro with PROC SURVEYREG:
• Uses all survey design variables (strata/cluster/weight)
• Iteratively runs simple regression models for each predictor variable
• Builds table with r-squared, r, and p-values
• Sorted by r
Similar results for all three approaches
PROC CORR output unwieldy with large # of predictor variables
PROC CORR cannot use strata and cluster variables
16
PROC CORR
PROC CORR DATA=PQI.MEPS_2010 PLOTS=MATRIX
RANK;
VAR AGE10X WAGEP10X TTLP10X FAMINC10 POVLEV10
TOTSLF10 ERTEXP10 ERTOT10 RXEXP10 OPTEXP10
OPTOTV10 OBVEXP10 OBTOTV10 IPTEXP10 IPNGTD10;
WITH TOTEXP10;
WEIGHT PERWT10F;
RUN;
17
Step 1: PROC SURVEYCORR
PROC SQL;
SELECT NVAR INTO :NVAR
FROM DICTIONARY.TABLES
WHERE LIBNAME='PQI' AND MEMNAME='MEPS_2010';
QUIT;
• SQL dictionary tables used to select # of predictor variables in the
dataset and store in macro variable.
• Note: Data set names stored in dictionary tables using all caps.
• # of predictor variables (nvar) = # of iterations SAS will use in DO LOOP
later on in the program.
18
Step 2: PROC SURVEYCORR
PROC CONTENTS DATA=PQI.MEPS_2010 OUT=CONTENTS
NOPRINT;
RUN;
PROC SQL NOPRINT;
SELECT NAME INTO:VAR1-:VAR76
FROM WORK.CONTENTS;
QUIT;
• PROC CONTENTS used to obtain a list of predictor variable names
• List of variable names stored as macro variables using PROC SQL
SELECT INTO statement:
19
Step 3: PROC SURVEYCORR
PROC SQL;
CREATE TABLE SURVEYCORR
(PARAMETER CHAR(15),R_SQUARE CHAR(8),R NUM(8),PROBT
NUM(8));
QUIT;
• Create empty table to store data
• Output from PROC SURVEYREG will be inserted one row at a time
20
Step 4: PROC SURVEYCORR
%MACRO CORR(TARGET=);
PROC SURVEYREG DATA=PQI.MEPS_2010;
STRATA VARSTR;
CLUSTER VARPSU;
WEIGHT PERWT10F;
MODEL &TARGET=&&VAR&I /SOLUTION;
ODS OUTPUT PARAMETERESTIMATES=PARAMETER_EST
FITSTATISTICS=FIT;
RUN;
• First part of macro
• PROC SURVEYREG uses survey design variables in strata, cluster, and
weight statements
• Optional ODS OUTPUT statement stores parameter estimates, fit
statistics, and other information created when the model runs
21
Step 5: PROC SURVEYCORR
PROC SQL;
INSERT INTO SURVEYCORR
SELECT
PARAMETER
,CVALUE1 AS R_SQUARE
,SIGN(ESTIMATE)* SQRT(INPUT(CVALUE1,8.)) AS R
,PROBT AS PVALUE
FROM FIT
,PARAMETER_EST
WHERE LABEL1 = "R-SQUARE"
AND PARAMETER = "&&VAR&I";
QUIT;
%MEND CORR;
•
•
•
•
•
R-square value extracted from FitStatistics output with PROC SQL
P-value and sign of estimated regression coefficient from ParameterEstimates
Square root function to get correlation coefficient
Sign of regression coefficient = direction of correlation (-/+) with target
Target variable input as a parameter when the macro is called
22
Step 6: PROC SURVEYCORR
%MACRO LOOP;
%DO I=1 %TO &NVAR;
%CORR(TARGET=PUBAT10X);
%END;
%MEND LOOP;
•
•
•
•
Call the macro
Input desired target variable as parameter
Iterate for each predictor variable (NVAR times)
Each time macro is run, new row inserted in table SURVEYCORR
23
Step 7: PROC SURVEYCORR
PROC SQL;
CREATE TABLE PQI.SURVEYCORR AS
SELECT
PARAMETER
,R_SQUARE
,R FORMAT BEST6.4
,PROBT AS PVALUE FORMAT PVALUE6.4
,CASE WHEN PROBT <=0.05 THEN "YES" ELSE "NO" END AS
SIGNIFICANT_95
FROM SURVEYCORR
WHERE PARAMETER NOT IN
('DUPERSID','VARSTR','VARPSU','PERWT10F')
ORDER BY ABS(R) DESC; QUIT;
• Use PROC SQL to:
• Format results
• Sort by correlation size
• Exclude survey design variables from tabulated output
24
PROC SURVEYCORR Output
parameter
r-square
r
p-value
significance
(95% C.L.)
TOTEXP10
1.000
1.000 <0.0001
yes
IPTEXP10
0.687
0.829 <0.0001
yes
TOTEXP_HIGH
0.287
0.536 <0.0001
yes
IPNGTD10
0.270
0.520 <0.0001
yes
OBVEXP10
0.228
0.477 <0.0001
yes
RXEXP10
0.206
0.454 <0.0001
yes
OBTOTV10
0.158
0.398 <0.0001
yes
OPTEXP10
0.121
0.348 <0.0001
yes
TOTSLF10
0.116
0.340 <0.0001
yes
ADAPPT42
0.089
0.298 <0.0001
yes
25
Conclusions
Recommendations/Conclusions
• Only 4 SAS Survey Procedures
• No PROC SURVEYCORR
• Person weights, but
• No strata/cluster variables
• Significance level (p values) may be less accurate with
complex survey designs
• Iterative approach with PROC SURVEYREG
• Can get r and p for large # of predictor variables
• Output tabled and ranked
• For categorical variables:
• Either reformat to numeric first
• Or use CLASS statement in PROC SURVEYREG
27
References
References
•
•
•
•
•
Carrington, W. J., Eltinge, J. L., & McCue, K. (2000). An Economist’s Primer on Survey
Samples. Working Paper no. 00-15. Suitland, MD: Center for Economic Studies, U.S.
Bureau of the Census, October 2000. Retrieved from
ftp://tigerline.census.gov/ces/wp/2000/CES-WP-00-15.pdf January 15, 2013.
Cohen, J.W., & Rhoades, J.A. (2009). Group and Non-Group Private Health Insurance
Coverage, 1996 to 2007: Estimates for the U.S. Civilian Noninstitutionalized Population
under Age 65. Medical Expenditure Panel Survey (MEPS) Statistical Brief #267. Agency for
Healthcare Research and Quality, Rockville, MD. Retrieved from
http://meps.ahrq.gov/data_files/publications/st267/stat267.pdf
DiJulio, B., & Claxton, G. (2010). Comparison of Expenditures in Nongroup and EmployerSponsored Insurance: 2004-2007. Kaiser Family Foundation, Menlo Park, CA. Retrieved
from http://www.kff.org/insurance/snapshot/chcm111006oth.cfm
Kaiser Family Foundation (2008). How Non-Group Health Coverage Varies with Income.
Menlo Park, CA. Retrieved from http://www.kff.org/insurance/upload/7737.pdf
Machlin, S., & Yu, W. (2005). MEPS Sample Persons In-Scope for Part of the Year:
Identification and Analytic Considerations. April 2005. Agency for Healthcare Research and
Quality, Rockville, MD. Retrieved from
http://www.meps.ahrq.gov /survey_comp/hc_survey/hc_sample.shtml
29
References (continued)
•
•
•
•
•
Machlin, S., Yu, W., & Zodet, M. (2005). Computing Standard Errors for MEPS Estimates.
January 2005. Agency for Healthcare Research and Quality, Rockville, Md. Retrieved from
http://www.meps.ahrq.gov/survey_comp/standard_errors.jsp
Medical Expenditure Panel Survey (MEPS). (2012). MEPS HC-138: 2010 Full Year
Consolidated Data File. Rockville, MD: Agency for Healthcare Research and Quality
(AHRQ), September 2012. Retrieved from
http://meps.ahrq.gov/data_stats/download_data/pufs/h138/h138doc.pdf September 27,
2012.
Medical Expenditure Panel Survey (MEPS). (2012). MEPS HC-138: 2010 Full Year
Consolidated Data Codebook. Rockville, MD: Agency for Healthcare Research and Quality
(AHRQ), August 30, 2012. Retrieved from
http://meps.ahrq.gov/mepsweb/data_stats/download_data_files_codebook.jsp?PUFId=H13
8 September 27, 2012.
Medical Expenditure Panel Survey (MEPS). MEPS-HC Panel Design and Collection
Process. Agency for Healthcare Research and Quality, Rockville, Md. Retrieved from
http://www.meps.ahrq.gov/survey_comp/hc_data_collection.jsp
Medical Expenditure Panel Survey (MEPS). Data Use Agreement. Agency for Healthcare
Research and Quality, Rockville, Md. Retrieved from
http://meps.ahrq.gov/mepsweb/data_stats/data_use.jsp
30
References (continued)
•
•
•
O’Neill, J., & O’Neill, D. (2009). Who are the uninsured? An Analysis of America’s
Uninsured Population, Their Characteristics, and Their Health. Employment Policies
Institute, Washington, D.C.
SAS Institute Inc.(2008). SAS/STAT 9.2 User’s Guide. Chapter 14: Introduction to Survey
Sampling and Analysis Procedures. Pp. 259-270. Cary, NC: SAS Institute Inc. Retrieved
from
http://support.sas.com/documentation/cdl/en/statugsurveysamp/61762/PDF/default/statugs
urveysamp.pdf on January 15, 2013.
Trish, E., Damico, A., Claxton, G., Levitt, L., & Garfield, R. (2011). A Profile of Health
Insurance Exchange Enrollees. Kaiser Family Foundation, Menlo Park, CA. Retrieved
from http://www.kff.org/healthreform/upload/8147.pdf
31

Segmentation Report

Transcript Segmentation Report

Directory