Transcript Document
Cancer Prevention at CCO from Surveys to Surveillance Eric Holowaty, Director, Cancer Surveillance, CCO. Mohamed Abdollel, Sr. Res. Assoc., CCO. Or Making More Sense of Outline Cancer prevention at CCO RRFSS surveys Fundamentals of estimating population parameters More advanced analyses Lessons Audience Experienced with RRFSS analysis using standard statistical packages (SPSS; SAS) frequency tabs and cross tabs using unweighted or weighted/normalized adjustments. Wanting to do more advanced RRFSS analyses combining PHU areas or examining finer geo. breakdown estimating age-adjusted proportions (reweighting) comparing differences between proportions with proper adjustment for variances assoc. with complex survey designs estimating temporal trends multi-variate analysis to better explain differences Cancer Prevention at CCO Cancer 2020 Targets Reduce teen smoking to 2%; reduce adult smoking to 5% At least 90% of smokers try to quit each year; <1% ETS exposure At least 90% consume 5 or more servings of F&V per day At least 90% participate in mod-vig. PA on most days At least 90% of Ontarians have BMI <30 At least 98% of Ontarians follow CAMH low risk drinking guide Reduce time in sun by 75% At least 90% of women 50-69 yrs participate in org. breast screening At least 95% of women ever sexually active participate in org. cervical screening At least 90% of adults 50-74 yrs participate in organized CRC screening Cancer Indicators - The Journey Developing the Indicators - expert opinion, with consideration of reliability,validity, robustness, responsiveness, ease of collection, interpretability, ext. comparability, vulnerability to gaming, utility. Assessing Data Availability and Collecting and Warehousing the Data - administrative sources, registries, surveys; methodological problems (coverage, timeliness, accessibility, permanence, granularity, quality control) Synthesis and Analysis - standardization, comparators, estimation (point estimates and precision), hypothesis testing, trend analysis Reporting and Dissemination - especially through e-Portals. Cancer Prevention Indicators at CCO Risk Factor Surveillance in the GTA Methodology Background While data on direct cancer outcomes are quite comprehensive, our data on risk factors and other determinants of cancer risk are not nearly as complete. Household sample surveys have proven to be our most important source of information about these determinants. A number of larger national household surveys have been undertaken, although these typically address a wide range of health-related issues. While these surveys have contributed substantially to our understanding of the prevalence of risk factors in our population, it is important to note that they have varying objectives, target populations, methodology and data quality. The overall purpose of the Risk Factor Surveillance Project recently initiated at CCO is to ensure the optimal use of available data and, where necessary, to introduce enhancements to data collection, analysis and dissemination, in order to effectively promote cancer risk factor surveillance and to support effective decision-making with regard to cancer prevention in Ontario. More specific aims of this demonstration project include: 1. Development and enhancement of population indicators for cancer prevention and screening, with the primary focus on the utility of these indicators as measures of effective program outcomes. These indicators cover a broad scope, and include: demographic and socio-economic factors; health status indicators; lifestyle/behavioural factors; living and working conditions, including the physicochemical environment; availability and utilization of prevention and screening programs; and expenditures. In particular, these indicators must be closely aligned with targets and priorities set in the Cancer 2020 Plan (see Table). 2. Liaison with major suppliers and users of risk factor data, with particular emphasis on standardization of methods and tools of data collection and analysis, effective data synthesis (particularly across independent surveys), development and application of robust analytic methodology, quality assurance, and improvements in the presentation and dissemination of useful information. 3. Creation of a user-friendly inventory/database of those indicators necessary for effective planning and evaluation of cancer prevention and screening programmes and activities across Ontario. This inventory will include international, national, provincial and more local surveys, registries and administrative sources of relevant data. It will include a concise description of survey methods and questions, clear definitions of the indicators and their components, data quality issues such as representativeness of sample frames, non-response rates and measurement errors (validity, reliability), and ease of access to, and granularity of, available data. Assurance of timely reporting of accurate estimates of the prevalence of risk and protective factors, short-term (2-3 years) and longer-term (10-20 years) temporal trends, differences between/among subgroups within the major sampling domains, including sex (males, females, both), age groupings (teens (12-19 years); young adults (20-44); middle-aged (45-64) and older (65+)), geographic area of residence Cancer Prevention Indicators CANCER 2020 RISK FACTOR TEEN SMOKING ADULT SMOKING UITTING SMOKING OSURE TO SECONDHAND SMOKE MOKE-FREE SPACE RUIT & VEGETABLE INTAKE YSICAL INACTIVITY OBESITY MEASURE MOST RECENT ESTIMATE CANCER 2020 TARGET Percent of teens who are current cigarette smokers. Percent of adults who are current cigarette smokers (ages 20 and older). Percent of current smokers who will not make at least one attempt to quit smoking per year. Percent of non-smoking Ontarians who will be exposed to secondhand smoke in the home and in private vehicles. Percent of public places (including bars, restaurants and gaming facilities) in Ontario that will not be smoke-free. Percent of Ontarians who consume less than 5 servings of vegetables and fruits daily. Percent of Ontarians who participate in less than moderate to vigorous activity on most days of the week. Percent of Ontarians who are obese, as measured by a Body Mass Index over 30. 19% 2% 26% 5% 52% 10% 18% (children) 25% (adults) Less than 1% 50% coverage in Ontario 0% 32% adults 44% children over 12 years old 90% 34% 90% Over 15% 10% Cancer Prevention Indicators Define : balanced portfolio of measures of outcomes, processes and resources which reflect the performance of the cancer prevention system/programmes in Ontario. Features of GOOD Indicators: amenable to change by planner and provider; understandable by people who need to act; help to galvanize action robust, reliable and valid; demonstrated relationship to Ca prevention easy to access necessary data; simple and cheap to collect/measure sufficient granularity and timeliness suitable for internal and external benchmarking; adjustable for confounders if measured over time, will reflect results of actions Risk Factor Surveillance at CCO Goals timely reporting of accurate estimates of the prevalence of risk and protective factors across the population estimation of shorter term (2-3 years) and longer term (10-20 yrs) temporal trends estimation of differences between/among subgroups within the major sampling domains sex (males, females) age groups (12-19; 20-44; 45-64 and 65+) areas of residence (CDs; PHU areas; health planning regions; prov./natl comparisons) various socio-demographic factors (income; educ.;ethn.) Risk Factor Surveillance in Ontario over 1970s-1990s Uncoordinated Fragmented Lack of smaller area data Poorly analysed Poor dissemination Not timely Difficult to access RRFSS Design Population - adults (18+ yrs) living in households in the PHU stratum of interest, with 1+ phones (landlines) Sampling - DSS (disproportionate strat. sampling) 1st stage - selection of PHU stratum 2nd stage - random selection (RDD) of eligible households (PSUs) within each PHU stratum 3rd stage - random selection of one eligible adult within each household RRFSS Population Coverage RRFSS Start-up by PHU 12000000 87% of pop’n! Population Coverage 10000000 8000000 RRFSS-GTA respondents : 15,000 CCHS-GTA resp. : 9,000 6000000 Durham Peel 4000000 Toronto Halton 2000000 York 0 Jan-01 May-01 Sep-01 Jan-02 May-02 Sep-02 Date of Start-up Jan-03 May-03 Sep-03 PHU popn Cum Coverage Benefits of RRFSS? Monthly data more timely; possibly more suitable for detecting temporal changes More flexibility re. aggregation - before / after comparisons; geographic areas; demographic groups Known sampling weights, design and use of complex survey analysis software permit accurate est. of Var Robust SPC procedures permit timely detection of stat. signif. changes LARGE sample size permits more precise analysis Standard CORE of questions helps ensure comparability over time and with other geo. areas. Flexible MODULES permit targetted sampling and invest. of local concerns RRFSS GTA Coverage of Cancer Prevention Indicators 2001-03 Teen Smoke Adult Smoke Quit smoke Toronto Peel York Durham Halton ETS Alcohol F&V Phys Act Wt control sun safety Cx screening Br screening CRC screening 0% 20% 40% 60% GTA Pop’n Coverage 80% 100% RRFSS Fundamental Population Parameters PHU-wide (or part), Regional, Provincial Total population affected Proportions (prevalence) and Means Differences of proportions or means Estimate of population proportion (P): not as simple as P=Y/N rather: P = Swiyi / Swi Where Swi sums to the total adult pop’n of the PHU BUT, point estimates not enough Need estimated variance Because the prevalence estimate is really a ratio estimate, and because of the complex design, formulas for estimating variance are very COMPLICATED Taylor Series Linearization commonly used (SUDAAN; Stata) Replication/resampling methods - bootstrapping; balanced half-samples; jackknife techniques CAUTION : Ignore the design effects in your analysis, and you will likely underestimate the true variance Type 1 errors! Analysis procedure Effect on estimates of prevalence and std error Example : Prevalence of Current Smoking in Durham 2003 Sex No. surveyed Design effects+ weights Final wts. only Normalized weights No weighting males 517 25.0% 25.0% 24.7% 24.0% (2.1%) (0.1%) (1.9%) (1.9%) 21.2% 21.2% 20.9% 21.1% (1.8%) (0.1%) (1.6%) (1.6%) std error females std error 684 Problem of Multiple Hypothesis Testing Bonferroni Inequality - probability that at least one of n independent events occurs is equal to the sum of the probabilities of the indiv. events If a single hypothesis is tested, and alpha=.05, then the probability of rejecting the null hypothesis is at most .05 however, if 6 independent tests, and alpha=.05, then probability of falsely rejecting at least one of the hypotheses is .30 Bonferroni correction : select alpha/n in this example, if alpha/n = .008, then the probability of falsely rejecting at least one of the six tests is now .05 Prevalence of Current Smoking Among Adults in the GTA, 2003 1. Tabular Results PHU AREA/CANCER PLANNING REGION DURHAM DOMAINS Sex Age Group * HALTON Sex Age Group * PEEL Sex * Age Group TORONTO Sex Age Group * YORK Sex Age Group GTA TOTAL* Sex * Age Group * NON-GTA2 Sex Age Group ALL ONTARIO2 Sex Age Group SUBGROUPS Male Female 20-44 45-64 65+ Male Female 20-44 45-64 65+ Male Female 20-44 45-64 65+ Male Female 20-44 45-64 65+ Male Female 20-44 45-64 65+ Male Female 20-44 45-64 65+ Male Female 20-44 45-64 65+ Male Female 20-44 45-64 65+ PREVALENCE # 48,100 42,000 59,700 25,700 4,700 33,700 27,600 37,000 20,400 3,900 94,200 64,100 110,700 37,000 10,600 212,700 169,700 227,600 125,400 29,500 57,000 47,700 66,700 30,300 7,700 445,700 351,100 501,600 238,800 56,400 757,000 671,300 794,200 396,500 108,900 1,319,700 1,099,700 1,399,900 652,100 161,900 % 25.1 21.2 29.0 19.5 9.0 23.3 18.2 25.6 19.9 8.0 24.0 15.8 24.7 14.3 11.6 22.5 16.5 22.5 20.7 8.2 20.2 16.5 22.3 15.3 10.4 22.8 16.9 23.8 18.4 9.0 28.2 24.2 33.4 26.0 13.0 27.2 21.9 30.7 24.0 11.6 SHORT TERM TRENDS LONG TERM TRENDS falling sl. rising falling falling n/a falling rising rising falling falling falling falling rising falling n/a rising falling falling rising n/a falling falling falling rising n/a stable falling stable falling rising n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a falling falling falling falling falling falling sl. falling falling falling falling falling falling falling falling falling falling falling falling falling falling sl. rising sl. rising rising falling sl. falling sl. falling falling falling falling falling sl. falling falling falling sl. falling falling falling falling falling falling falling Percent Current Smokers Figure 1 : Prevalence of Current Smoking in the GTA Adult Females (20+ yrs) 2003 40 30 Ontario 21.9% 20 10 Cancer 2020 Target 5% 0 Peel York Toronto Source : Rapid Risk Factor Surveillance System, 2003. Note : Weighted point estimates and 95% confidence limits are shown; White line denotes weighted point estimate for Ontario as a whole (CCHS 2000/01); Halton Public Health Units in the GTA Yellow line denotes Cancer 2020 Target Durham 2003 450,000 900,000 400,000 800,000 350,000 700,000 300,000 600,000 250,000 500,000 200,000 400,000 150,000 300,000 100,000 200,000 50,000 100,000 0 0 Toronto Peel York Source : Rapid Risk Factor Surveillance System, 2003. Public Health Note : Bars denote weighted estimates within each PHU; White line denotes cumulative sum across the GTA Durham Unit Area Halton Cumulative Frequency across GTA Frequency of Current Smokers Frequency of Current Smokers in the GTA Both sexes combined (20+ yrs) Frequency of Obese Adults in the GTA 180,000 400,000 160,000 350,000 140,000 300,000 120,000 250,000 100,000 200,000 80,000 150,000 60,000 100,000 40,000 50,000 20,000 0 0 Toronto Peel York Source : Rapid Risk Factor Surveillance System, 2003. Public Health Note : Bars denote weighted estimates within each PHU; White line denotes cumulative sum across the GTA Durham Unit Area Halton Cumulative Frequency across GTA Frequency of Current Smokers Both sexes combined (20-64 yrs) 2003 Cancer Prevention and Screening Tobacco-assoc. Cancers Other Indicators under development at CCO Cigarette smoking among youths (OSDUS) Exposure to ETS in public places ? Cigarette sales ? Cigarette prices ? Prevalence of ex-smokers Cancer Prevention and Screening Nutrition-assoc. Cancers Other Indicators (cont’d) No daily fruit or vegetable consumption among youths Overweight prevalence among adults Cancer Prevention and Screening Physical Activity-assoc. Cancers Indicators Lack of vigorous PA among youths Cancer Prevention and Screening Potentially Carcinogenic Exposures Indicators Workplace Ambient environment Multivariate Logistic Regression Taking into account final weights and design effects Permits finer adjustment for “nuisance” variables, while testing for main effects (e.g. geographic differences), as well as interactions. Remember, the dependent variable is the ODDS of having the risk factor, not the likelihood. ODDS = Pr(RF+) / Pr(RF-) (see example) Multivariate Logistic Regression (cont’d) This is the syntax for weighted LOGISTIC Regression in SUDAAN. The model is: CURRSMK = GENDER+AGEGRP+HUAREA+month Additionally, a contrst, comparing DURHAM to other REGIONS combined is undertaken. Multivariate Logistic Regression (cont’d) According to the Wald ChiSq Test, GENDER AGEGRP and HUAREA area highly signif. However, month is NOT. In terms of DURHAM contrast, it is highly significant and accounts for most of the HUAREA variation. Multivariate Logistic Regression (cont’d) In this multivariate sanalysis, the odds of being a CURRSMK is 41% higher in males than females, and declines steadily with age. And the odds is signif. higher in Durham region. Again, the odds assoc. with a monthly change is non-signif. Time Trends How to best “model” a time series of data points in order to confidently detect a real secular trend, or an abrupt change point? How to detect and control for seasonal patterns and autocorrelations among neighbouring data points? Is there an optimal periodicity to use? Monthly, quarterly, semi-annually? n= 50, 155, 310 Exposure to ETS Bootstrap Estimates of Slopes GTA, Females 1.05 Exposure to ETS Bootstrap Estimates of Slopes Durham, Females 1.05 1.05 Exposure to ETS Bootstrap Estimates of Slopes Durham, Ages 20-44 n= 55, 165, 330 n= 250, 770, 1540 Month Quarter Semi 1.00 0.90 0.95 Slope 1.00 0.90 0.95 Slope 0.90 0.95 Slope The spread of these Box and Whisker plots suggests tighter estimates for monthly time series, vs quarterly or semiannual. This is empirical evidence that trends are more precise if estimated from monthly series. 1.00 Periodicity Month Quarter Semi Month Quarter Semi Typical Time Series for One Participating PHU p-Charts 5 10 15 20 25 30 0.6 0 5 PHU= DURHAM , Sex = F 10 15 20 25 30 25 PHU= YORK , Sex =F 30 20 25 30 0.6 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.4 0.3 0.2 0.1 20 15 ETS Exposure 0.0 15 10 PHU= PEEL , Sex = F 0.5 0.6 0.5 0.4 0.3 0.2 0.1 10 5 ETS Exposure Stat. Signif. 5 0 PHU= HALTON , Sex = F ETS Exposure 0 0.5 0.0 0.1 0.2 0.3 0.4 0.5 0.4 0.3 0.2 0.0 0.0 0 0.0 The Control Chart is a more powerful and sensitive version of the basic Run Chart, with a central value (CL) that is usually the arithmetic mean, and control limits (UCL and LCL) to establish the bounds of “acceptable” variation. 0.1 0.2 0.3 0.4 0.5 ?Control Charts 0.1 ?Control Charts ETS Exposure 0.6 ETS Exposure 0.6 ETS Exposure 0 5 10 15 20 25 30 PHU= TORONTO , Sex = F 0 5 10 15 20 25 PHU= GTA , Sex = F 30 Statistical Testing - Western Electric Rules The Western Electric Rules, originally developed for AT&T, are more sensitive for detecting “special cause” variation, as distinct from expected, or general cause, variation. These patterns, if they occur, are unlikely to be chance variations. Of course, the larger the total number of observations, the greater the risk of “false alarms”. A series of 20-25 data points is probably ideal to study. 1 point beyond Zone A (3 sd) 2/3 points in row in Zone A (3 sd) 9 points in seq. in Zone C (1 sd) 4/5 points in row in Zone B (2 sd) 6 points in row 15 points in row in Zone C (1 sd) 14 points in row alter. 8 points in row in Zones A or B 0.4 0.2 10 15 20 25 30 10 15 20 25 PHU= YORK , Sex = M 30 10 15 20 25 PHU= PEEL , Sex = M ETS Exposure ETS Exposure 30 0.4 0.0 0.2 0.4 5 5 PHU= HALTON , Sex = M 0.2 0 0 0.6 5 0.6 0 0.0 0.0 0.2 0.4 0.6 0 5 10 15 20 25 30 Are these data points indep. and normally distr’n? Are PHU= DURHAM, Sex = M there temporal trends? Are there important change points? ETS Exposure 0.0 0.2 0.0 0.4 0.2 0.0 0.4 0.6 ETS Exposure 0.6 ETS Exposure 0.6 ETS Exposure 0 5 10 15 20 25 30 PHU= TORONTO , Sex = M 0 5 10 15 20 25 PHU= 9GTA , Sex = M 30 Loglinear Regression Change-point Detection Joinpoint Version 2.7 Analysis of trends using joinpoint models fits simplest joinpoint model to trend data statistical comparison of whether more joinpoints are statistically significant. For more info, see: http://srab.cancer.gov/joinpoint/ Example : Detecting trends regarding ETS Exposure The “best fitting” line has zero Join Points. And the est. annual perent change (EAPC) is 10.1% per year. This is statistically significant (p<0.004) Example : Detecting trends regarding ETS Exposure The “best fitting” line has zero Join Points. And the est. annual perent change (EAPC) is 15.2% per year. This is statistically significant (p<0.0005) Example : Detecting trends regarding ETS Exposure The parameter estimate for slope has to be multiplied by 12 and then exponentiated to get the true EAPC = 10.1% per year. But its statistically significant! Example : Detecting trends regarding ETS Exposure The parameter estimate for slope has to be multiplied by 12 and then exponentiated to get the true EAPC = 15.2% per year. And its statistically significant! Conclusions : Benefits of RRFSS Monthly data more timely; likely more suitable for detecting temporal changes More flexibility re. aggregation - before / after comparisons; geographic areas; demographic groups Known sampling weights, design and use of complex survey analysis software permit accurate est. of Var Robust SPC procedures permit timely detection of stat. signif. changes - not sure about this! ?JoinPoint LARGE sample size permits more precise analysis Standard CORE of questions helps ensure comparability over time and with other geo. areas. Flexible MODULES permit targetted sampling and invest. of local concerns Lessons Analysis of survey data not straightforward - complex sampling; multiple sources of error, etc. Use population sampling weights when deriving estimates Use specialized software to estimate variance correctly Data utilization the best guide for Indicator Dev’t and Data Quality Measurement/Improvement Murphy’s Law : “If anything can go wrong, it will” Corollary to Murphy’s Law : “Everything takes longer than you think” Dr. A.B. Miller : “If it wasn’t a difficult problem to solve, it wouldn’t be worth it in the end” And, finally Thank you for the privilege!