Transcript Document

Cancer Prevention at CCO
from Surveys to Surveillance
Eric Holowaty,
Director, Cancer Surveillance, CCO.
Mohamed Abdollel,
Sr. Res. Assoc., CCO.
Or
Making More Sense of
Outline
 Cancer prevention at CCO
 RRFSS surveys
 Fundamentals of estimating population
parameters
 More advanced analyses
 Lessons
Audience
 Experienced with RRFSS analysis using
standard statistical packages (SPSS; SAS)
 frequency tabs and cross tabs using unweighted or
weighted/normalized adjustments.
 Wanting to do more advanced RRFSS analyses
 combining PHU areas or examining finer geo. breakdown
 estimating age-adjusted proportions (reweighting)
 comparing differences between proportions with proper
adjustment for variances assoc. with complex survey designs
 estimating temporal trends
 multi-variate analysis to better explain differences
Cancer Prevention at CCO
Cancer 2020 Targets










Reduce teen smoking to 2%; reduce adult smoking to 5%
At least 90% of smokers try to quit each year; <1% ETS exposure
At least 90% consume 5 or more servings of F&V per day
At least 90% participate in mod-vig. PA on most days
At least 90% of Ontarians have BMI <30
At least 98% of Ontarians follow CAMH low risk drinking guide
Reduce time in sun by 75%
At least 90% of women 50-69 yrs participate in org. breast screening
At least 95% of women ever sexually active participate in org.
cervical screening
At least 90% of adults 50-74 yrs participate in organized CRC
screening
Cancer Indicators - The Journey

Developing the Indicators - expert opinion, with consideration of
reliability,validity, robustness, responsiveness, ease of collection,
interpretability, ext. comparability, vulnerability to gaming, utility.

Assessing Data Availability and Collecting and Warehousing the
Data - administrative sources, registries, surveys; methodological
problems (coverage, timeliness, accessibility, permanence,
granularity, quality control)

Synthesis and Analysis - standardization, comparators, estimation
(point estimates and precision), hypothesis testing, trend analysis
 Reporting and Dissemination - especially through e-Portals.
Cancer Prevention Indicators at
CCO
Risk Factor Surveillance in the GTA
Methodology
Background
While data on direct cancer outcomes are quite comprehensive, our data on risk factors and other
determinants of cancer risk are not nearly as complete. Household sample surveys have proven to
be our most important source of information about these determinants. A number of larger
national household surveys have been undertaken, although these typically address a wide range
of health-related issues. While these surveys have contributed substantially to our understanding
of the prevalence of risk factors in our population, it is important to note that they have varying
objectives, target populations, methodology and data quality.
The overall purpose of the Risk Factor Surveillance Project recently initiated at CCO is to ensure
the optimal use of available data and, where necessary, to introduce enhancements to data
collection, analysis and dissemination, in order to effectively promote cancer risk factor
surveillance and to support effective decision-making with regard to cancer prevention in
Ontario. More specific aims of this demonstration project include:
1. Development and enhancement of population indicators for cancer prevention and
screening, with the primary focus on the utility of these indicators as measures of
effective program outcomes. These indicators cover a broad scope, and include:
demographic
and
socio-economic
factors;
health
status
indicators;
lifestyle/behavioural factors; living and working conditions, including the physicochemical environment; availability and utilization of prevention and screening
programs; and expenditures. In particular, these indicators must be closely aligned
with targets and priorities set in the Cancer 2020 Plan (see Table).
2. Liaison with major suppliers and users of risk factor data, with particular emphasis
on standardization of methods and tools of data collection and analysis, effective data
synthesis (particularly across independent surveys), development and application of
robust analytic methodology, quality assurance, and improvements in the
presentation and dissemination of useful information.
3. Creation of a user-friendly inventory/database of those indicators necessary for
effective planning and evaluation of cancer prevention and screening programmes
and activities across Ontario. This inventory will include international, national,
provincial and more local surveys, registries and administrative sources of relevant
data. It will include a concise description of survey methods and questions, clear
definitions of the indicators and their components, data quality issues such as
representativeness of sample frames, non-response rates and measurement errors
(validity, reliability), and ease of access to, and granularity of, available data.
Assurance of timely reporting of accurate estimates of the prevalence of risk and protective
factors, short-term (2-3 years) and longer-term (10-20 years) temporal trends, differences
between/among subgroups within the major sampling domains, including sex (males, females,
both), age groupings (teens (12-19 years); young adults (20-44); middle-aged (45-64) and older
(65+)), geographic area of residence
Cancer Prevention Indicators
CANCER 2020
RISK FACTOR
TEEN SMOKING
ADULT SMOKING
UITTING SMOKING
OSURE TO SECONDHAND SMOKE
MOKE-FREE SPACE
RUIT & VEGETABLE
INTAKE
YSICAL INACTIVITY
OBESITY
MEASURE
MOST RECENT ESTIMATE
CANCER 2020
TARGET
Percent of teens who are current
cigarette smokers.
Percent of adults who are current
cigarette smokers (ages 20 and
older).
Percent of current smokers who
will not make at least one attempt
to quit smoking per year.
Percent of non-smoking Ontarians
who will be exposed to secondhand smoke in the home and in
private vehicles.
Percent of public places (including
bars, restaurants and gaming
facilities) in Ontario that will not be
smoke-free.
Percent of Ontarians who
consume less than 5 servings of
vegetables and fruits daily.
Percent of Ontarians who
participate in less than moderate
to vigorous activity on most days
of the week.
Percent of Ontarians who are
obese, as measured by a Body
Mass Index over 30.
19%
2%
26%
5%
52%
10%
18% (children)
25% (adults)
Less than 1%
50% coverage in Ontario
0%
32% adults
44% children over
12 years old
90%
34%
90%
Over 15%
10%
Cancer Prevention Indicators
Define : balanced portfolio of measures of outcomes, processes
and resources which reflect the performance of the cancer
prevention system/programmes in Ontario.
Features of GOOD Indicators:






amenable to change by planner and provider; understandable by
people who need to act; help to galvanize action
robust, reliable and valid; demonstrated relationship to Ca prevention
easy to access necessary data; simple and cheap to collect/measure
sufficient granularity and timeliness
suitable for internal and external benchmarking; adjustable for
confounders
if measured over time, will reflect results of actions
Risk Factor Surveillance at CCO
Goals
 timely reporting of accurate estimates of the prevalence
of risk and protective factors across the population
 estimation of shorter term (2-3 years) and longer term
(10-20 yrs) temporal trends
 estimation of differences between/among subgroups
within the major sampling domains
 sex (males, females)
 age groups (12-19; 20-44; 45-64 and 65+)
 areas of residence (CDs; PHU areas; health planning
regions;
prov./natl comparisons)
 various socio-demographic factors (income; educ.;ethn.)
Risk Factor Surveillance in
Ontario over 1970s-1990s







Uncoordinated
Fragmented
Lack of smaller area data
Poorly analysed
Poor dissemination
Not timely
Difficult to access
RRFSS
Design
 Population - adults (18+ yrs) living in households in
the PHU stratum of interest, with 1+ phones
(landlines)
 Sampling - DSS (disproportionate strat. sampling)
 1st stage - selection of PHU stratum
 2nd stage - random selection (RDD) of eligible
households (PSUs) within each PHU stratum
 3rd stage - random selection of one eligible adult within
each household
RRFSS Population Coverage
RRFSS Start-up by PHU
12000000
87% of pop’n!
Population Coverage
10000000
8000000
RRFSS-GTA respondents : 15,000
CCHS-GTA resp. : 9,000
6000000
Durham
Peel
4000000
Toronto
Halton
2000000
York
0
Jan-01
May-01
Sep-01
Jan-02
May-02
Sep-02
Date of Start-up
Jan-03
May-03
Sep-03
PHU popn
Cum Coverage
Benefits of RRFSS?
Monthly data more timely; possibly more suitable for
detecting temporal changes
 More flexibility re. aggregation - before / after
comparisons; geographic areas; demographic groups
 Known sampling weights, design and use of complex
survey analysis software permit accurate est. of Var
 Robust SPC procedures permit timely detection of stat.
signif. changes
 LARGE sample size permits more precise analysis
 Standard CORE of questions helps ensure comparability
over time and with other geo. areas.
 Flexible MODULES permit targetted sampling and invest.
of local concerns
RRFSS GTA Coverage of Cancer
Prevention Indicators 2001-03
Teen Smoke
Adult Smoke
Quit smoke
Toronto
Peel
York
Durham
Halton
ETS
Alcohol
F&V
Phys Act
Wt control
sun safety
Cx screening
Br screening
CRC screening
0%
20%
40%
60%
GTA Pop’n Coverage
80%
100%
RRFSS Fundamental Population
Parameters




PHU-wide (or part), Regional, Provincial
Total population affected
Proportions (prevalence) and Means
Differences of proportions or means
Estimate of population proportion (P):
not as simple as
P=Y/N
rather:
P = Swiyi / Swi
Where Swi sums to the total
adult pop’n of the PHU
BUT, point estimates not enough Need estimated variance
 Because the prevalence estimate is really a ratio
estimate, and because of the complex design, formulas
for estimating variance are very COMPLICATED
 Taylor Series Linearization commonly used (SUDAAN;
Stata)
 Replication/resampling methods - bootstrapping;
balanced half-samples; jackknife techniques
 CAUTION : Ignore the design effects in your analysis, and you will
likely underestimate the true variance
Type 1 errors!
Analysis procedure
Effect on estimates of prevalence and std error
Example : Prevalence of Current Smoking in Durham 2003
Sex
No.
surveyed
Design
effects+
weights
Final wts.
only
Normalized
weights
No
weighting
males
517
25.0%
25.0%
24.7%
24.0%
(2.1%)
(0.1%)
(1.9%)
(1.9%)
21.2%
21.2%
20.9%
21.1%
(1.8%)
(0.1%)
(1.6%)
(1.6%)
std error
females
std error
684
Problem of Multiple Hypothesis
Testing
Bonferroni Inequality - probability that at least one of
n independent events occurs is equal to the sum of the
probabilities of the indiv. events
 If a single hypothesis is tested, and alpha=.05, then the probability of
rejecting the null hypothesis is at most .05
 however, if 6 independent tests, and alpha=.05, then probability of
falsely rejecting at least one of the hypotheses is .30
 Bonferroni correction : select alpha/n
 in this example, if alpha/n = .008, then the probability of falsely rejecting at
least one of the six tests is now .05
Prevalence of Current Smoking Among Adults in the GTA, 2003 1.
Tabular
Results
PHU
AREA/CANCER
PLANNING
REGION
DURHAM
DOMAINS
Sex
Age Group *
HALTON
Sex
Age Group *
PEEL
Sex *
Age Group
TORONTO
Sex
Age Group *
YORK
Sex
Age Group
GTA TOTAL*
Sex *
Age Group *
NON-GTA2
Sex
Age Group
ALL ONTARIO2
Sex
Age Group
SUBGROUPS
Male
Female
20-44
45-64
65+
Male
Female
20-44
45-64
65+
Male
Female
20-44
45-64
65+
Male
Female
20-44
45-64
65+
Male
Female
20-44
45-64
65+
Male
Female
20-44
45-64
65+
Male
Female
20-44
45-64
65+
Male
Female
20-44
45-64
65+
PREVALENCE
#
48,100
42,000
59,700
25,700
4,700
33,700
27,600
37,000
20,400
3,900
94,200
64,100
110,700
37,000
10,600
212,700
169,700
227,600
125,400
29,500
57,000
47,700
66,700
30,300
7,700
445,700
351,100
501,600
238,800
56,400
757,000
671,300
794,200
396,500
108,900
1,319,700
1,099,700
1,399,900
652,100
161,900
%
25.1
21.2
29.0
19.5
9.0
23.3
18.2
25.6
19.9
8.0
24.0
15.8
24.7
14.3
11.6
22.5
16.5
22.5
20.7
8.2
20.2
16.5
22.3
15.3
10.4
22.8
16.9
23.8
18.4
9.0
28.2
24.2
33.4
26.0
13.0
27.2
21.9
30.7
24.0
11.6
SHORT
TERM
TRENDS
LONG TERM
TRENDS
falling
sl. rising
falling
falling
n/a
falling
rising
rising
falling
falling
falling
falling
rising
falling
n/a
rising
falling
falling
rising
n/a
falling
falling
falling
rising
n/a
stable
falling
stable
falling
rising
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
falling
falling
falling
falling
falling
falling
sl. falling
falling
falling
falling
falling
falling
falling
falling
falling
falling
falling
falling
falling
falling
sl. rising
sl. rising
rising
falling
sl. falling
sl. falling
falling
falling
falling
falling
sl. falling
falling
falling
sl. falling
falling
falling
falling
falling
falling
falling
Percent Current Smokers
Figure 1 : Prevalence of Current Smoking
in the GTA
Adult Females (20+ yrs) 2003
40
30
Ontario 21.9%
20
10
Cancer 2020
Target 5%
0
Peel
York
Toronto
Source : Rapid Risk Factor Surveillance System, 2003.
Note : Weighted point estimates and 95% confidence limits are shown;
White line denotes weighted point estimate for Ontario as a whole (CCHS 2000/01);
Halton
Public Health Units in the GTA
Yellow line denotes Cancer 2020 Target
Durham
2003
450,000
900,000
400,000
800,000
350,000
700,000
300,000
600,000
250,000
500,000
200,000
400,000
150,000
300,000
100,000
200,000
50,000
100,000
0
0
Toronto
Peel
York
Source : Rapid Risk Factor Surveillance System, 2003.
Public Health
Note : Bars denote weighted estimates within each PHU;
White line denotes cumulative sum across the GTA
Durham
Unit Area
Halton
Cumulative Frequency across GTA
Frequency of Current Smokers
Frequency of Current Smokers
in the GTA Both sexes combined (20+ yrs)
Frequency of Obese Adults in the GTA
180,000
400,000
160,000
350,000
140,000
300,000
120,000
250,000
100,000
200,000
80,000
150,000
60,000
100,000
40,000
50,000
20,000
0
0
Toronto
Peel
York
Source : Rapid Risk Factor Surveillance System, 2003.
Public
Health
Note : Bars denote weighted estimates within each
PHU;
White line denotes cumulative sum across the GTA
Durham
Unit Area
Halton
Cumulative Frequency across GTA
Frequency of Current Smokers
Both sexes combined (20-64 yrs) 2003
Cancer Prevention and Screening
Tobacco-assoc. Cancers
Other Indicators under development at CCO

Cigarette smoking among youths (OSDUS)
 Exposure to ETS in public places
 ? Cigarette sales
 ? Cigarette prices
 ? Prevalence of ex-smokers
Cancer Prevention and Screening
Nutrition-assoc. Cancers
Other Indicators (cont’d)


No daily fruit or vegetable consumption among youths
Overweight prevalence among adults
Cancer Prevention and Screening
Physical Activity-assoc. Cancers
Indicators

Lack of vigorous PA among youths
Cancer Prevention and Screening
Potentially Carcinogenic Exposures
Indicators

Workplace

Ambient environment
Multivariate Logistic Regression
 Taking into account final weights and design
effects
 Permits finer adjustment for “nuisance” variables,
while testing for main effects (e.g. geographic
differences), as well as interactions.
 Remember, the dependent variable is the ODDS
of having the risk factor, not the likelihood.
 ODDS = Pr(RF+) / Pr(RF-)
(see example)
Multivariate Logistic Regression (cont’d)
This is the syntax for weighted LOGISTIC Regression
in SUDAAN. The model is:
CURRSMK = GENDER+AGEGRP+HUAREA+month
Additionally, a contrst, comparing DURHAM to
other REGIONS combined is undertaken.
Multivariate Logistic Regression (cont’d)
According to the Wald ChiSq Test, GENDER
AGEGRP and HUAREA area highly signif.
However, month is NOT.
In terms of DURHAM contrast, it is highly
significant and accounts for most of the
HUAREA variation.
Multivariate Logistic Regression (cont’d)
In this multivariate sanalysis, the odds of being a
CURRSMK is 41% higher in males than females,
and declines steadily with age. And the odds is signif.
higher in Durham region. Again, the odds assoc.
with a monthly change is non-signif.
Time Trends
 How to best “model” a time series of data points
in order to confidently detect a real secular
trend, or an abrupt change point?
 How to detect and control for seasonal patterns
and autocorrelations among neighbouring
data points?
 Is there an optimal periodicity to use? Monthly,
quarterly, semi-annually?
n= 50, 155, 310
Exposure to ETS
Bootstrap Estimates of Slopes
GTA, Females
1.05
Exposure to ETS
Bootstrap Estimates of Slopes
Durham, Females
1.05
1.05
Exposure to ETS
Bootstrap Estimates of Slopes
Durham, Ages 20-44
n= 55, 165, 330
n= 250, 770, 1540
Month
Quarter
Semi
1.00
0.90
0.95
Slope
1.00
0.90
0.95
Slope
0.90
0.95
Slope
The spread of these Box and Whisker
plots suggests tighter estimates for
monthly time series, vs quarterly
or semiannual.
This is empirical evidence that trends
are more precise if estimated from
monthly series.
1.00
Periodicity
Month
Quarter
Semi
Month
Quarter
Semi
Typical Time Series for One
Participating PHU
p-Charts
5
10
15
20
25
30
0.6
0
5
PHU= DURHAM , Sex = F
10
15
20
25
30
25
PHU= YORK , Sex =F
30
20
25
30
0.6
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.4
0.3
0.2
0.1
20
15
ETS Exposure
0.0
15
10
PHU= PEEL , Sex = F
0.5
0.6
0.5
0.4
0.3
0.2
0.1
10
5
ETS Exposure
Stat.
Signif.
5
0
PHU= HALTON , Sex = F
ETS Exposure
0
0.5
0.0
0.1
0.2
0.3
0.4
0.5
0.4
0.3
0.2
0.0
0.0
0
0.0
The Control Chart is a more powerful
and sensitive version of the
basic Run Chart, with a central
value (CL) that is usually the
arithmetic mean, and control limits
(UCL and LCL) to establish
the bounds of “acceptable” variation.
0.1
0.2
0.3
0.4
0.5
?Control Charts
0.1
?Control
Charts
ETS Exposure
0.6
ETS Exposure
0.6
ETS Exposure
0
5
10
15
20
25
30
PHU= TORONTO , Sex = F
0
5
10
15
20
25
PHU= GTA , Sex = F
30
Statistical Testing - Western Electric
Rules
The Western Electric Rules, originally developed for AT&T,
are more sensitive for detecting “special cause” variation,
as distinct from expected, or general cause, variation. These patterns,
if they occur, are unlikely to be chance variations. Of course, the larger the
total number of observations, the greater the risk of “false alarms”. A series
of 20-25 data points is probably ideal to study.
1 point beyond Zone A (3 sd)
2/3 points in row in Zone A (3 sd)
9 points in seq. in Zone C (1 sd)
4/5 points in row in Zone B (2 sd)
6 points in row
15 points in row in Zone C (1 sd)
14 points in row alter.
8 points in row in Zones A or B
0.4
0.2
10
15
20
25
30
10
15
20
25
PHU= YORK , Sex = M
30
10
15
20
25
PHU= PEEL , Sex = M
ETS Exposure
ETS Exposure
30
0.4
0.0
0.2
0.4
5
5
PHU= HALTON , Sex = M
0.2
0
0
0.6
5
0.6
0
0.0
0.0
0.2
0.4
0.6
0
5 10 15 20 25 30
Are these data points indep.
and normally distr’n? Are
PHU= DURHAM, Sex = M
there temporal trends? Are there
important change points? ETS Exposure
0.0
0.2
0.0
0.4
0.2
0.0
0.4
0.6
ETS Exposure
0.6
ETS Exposure
0.6
ETS Exposure
0
5
10
15
20
25
30
PHU= TORONTO , Sex = M
0
5
10
15
20
25
PHU= 9GTA , Sex = M
30
Loglinear Regression
Change-point Detection
 Joinpoint Version 2.7
 Analysis of trends using joinpoint models
fits simplest joinpoint model to trend data
statistical comparison of whether more
joinpoints are statistically significant.
For more info, see:
http://srab.cancer.gov/joinpoint/
Example : Detecting trends regarding ETS
Exposure
The “best fitting” line has zero Join Points.
And the est. annual perent change (EAPC)
is 10.1% per year. This is statistically
significant (p<0.004)
Example : Detecting trends regarding ETS
Exposure
The “best fitting” line has zero Join Points.
And the est. annual perent change (EAPC)
is 15.2% per year. This is statistically
significant (p<0.0005)
Example : Detecting trends regarding ETS
Exposure
The parameter estimate for slope has to be
multiplied by 12 and then exponentiated
to get the true EAPC = 10.1% per year. But its
statistically significant!
Example : Detecting trends regarding ETS
Exposure
The parameter estimate for slope has to be
multiplied by 12 and then exponentiated
to get the true EAPC = 15.2% per year. And its
statistically significant!
Conclusions : Benefits of RRFSS
Monthly data more timely; likely more suitable for
detecting temporal changes
 More flexibility re. aggregation - before / after
comparisons; geographic areas; demographic groups
 Known sampling weights, design and use of complex
survey analysis software permit accurate est. of Var
 Robust SPC procedures permit timely detection of stat.
signif. changes - not sure about this! ?JoinPoint
 LARGE sample size permits more precise analysis
 Standard CORE of questions helps ensure comparability
over time and with other geo. areas.
 Flexible MODULES permit targetted sampling and invest.
of local concerns
Lessons
Analysis of survey data not straightforward - complex
sampling; multiple sources of error, etc.
 Use population sampling weights when deriving
estimates
 Use specialized software to estimate variance correctly
 Data utilization the best guide for Indicator Dev’t and
Data Quality Measurement/Improvement
 Murphy’s Law : “If anything can go wrong, it will”
 Corollary to Murphy’s Law : “Everything takes longer
than you think”
 Dr. A.B. Miller : “If it wasn’t a difficult problem to solve, it
wouldn’t be worth it in the end”
And, finally
Thank you for the privilege!