Experimental Studies - Fraser Health Authority

Download Report

Transcript Experimental Studies - Fraser Health Authority

Slide 1

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 2

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 3

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 4

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 5

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 6

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 7

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 8

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 9

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 10

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 11

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 12

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 13

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 14

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 15

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 16

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 17

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 18

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 19

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 20

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 21

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 22

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 23

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 24

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 25

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 26

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 27

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 28

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 29

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 30

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 31

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 32

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 33

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 34

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 35

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 36

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 37

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 38

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 39

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 40

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 41

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 42

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 43

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 44

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 45

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 46

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 47

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 48

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 49

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 50

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 51

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?


Slide 52

Surviving Pharmacy
Residency Research: Tips
and Tricks for Statistical
Planning

Surviving Pharmacy Residency
Research: Tips and Tricks for
Statistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.

FH Health Research Intelligence
Unit
How can we help?


Grant Facilitator-Writer
 Conducting a search for funding opportunities.
 Automatic notification of new funding sources and
deadlines.
 Identifying a research team.
 Preparing letters of intent.
 Identifying resources required for conducting research.
 Formulating the research budget.
 Writing the grant application in collaboration with
researchers.
 Understanding FH and funding agency requirements
regarding preparation of specific documents.

FH Health Research Intelligence
Unit
How can we help?











Epidemiologist
Specifying the research goal,
objectives and hypothesis.
Identifying measurable
outcomes.
Specifying the variables for
analysis.
Identifying sources of data.
Developing data collection
tools for quantitative or
qualitative studies.
Developing the statistical
analysis plan.
Understanding how to use
statistical software, such as
SPSS.

Workshop Outline










Research 101- Basic Research Steps
Research Question Refinement
Common Study Designs- Resource
Levels of Data
Power and Sample Size
Statistical Test Selection- Exercise
Data Reporting- Resource
Simple Stats with Excel- Resource

Pharmacy Residency Project










1) Develop a research question
2) Conduct thorough literature review
3) Re-define research question or hypothesis
4) Design research methodology/study
5) Create research proposal
6) Apply for funding
7) Apply for ethics approval
8) Collect and analyze data
9) Draw conclusions and relate findings

Research Question Refinement


Research question will describe in operational
terms, what you think will happen in the study.

Good Versus Bad Research
Question


Are patients who take
drug X more likely to
experience episodes
of delirium?



Do patients who
receive medication X
between September
2008 and November
2008 experience
more episodes of
delirium as compared
to patients who
received drug Y
during the same time
period?

Classification of Research
Studies
Observational Studies:
Descriptive Studies:
Focus on describing
populations and describing the
relationship between variables

Research Studies

Analytic Studies:
Observational
Descriptive

Experimental

Analytic

Make inferences about the
population based on a random
sample.
Experimental Studies:
Test relationships between
exposures and outcomes.
Investigator has direct control
over study condition and
exposure status.

Hierarchy of Studies
Type of study is selected
according to the purpose of
research.
Experimental Studies

Analytic Studies

Descriptive Studies

Levels of Evidence


HandoutResearch Design
Hierarchy

Probability Sampling Methods:
Random
 There

are several methods to choose

from:
 Simple random
sampling.

Probability Sampling Methods:
Stratified
 Stratified

sampling

(divide the population into
non-overlapping strata and
sample from within each
stratum independently).


Guarantees representation
of all important groups.

Probability Sampling Methods:
Systematic


Selection of the
sample using an
interval “k” so that
every “k” unit in the
frame is selected,
is called systematic
random sampling.

Probability Sampling Methods:
Systematic


Steps to achieve a systematic random sample:

1. Number the units in the population from 1 to N.
2. Decide on the n (sample size) that you want or need.


k = N/n = the interval size.

3. Randomly select an integer between 1 and k.
4. Then take every kth unit.
Example:
1. N=200
2. n=40, take N/n, 200/40=5 (interval size).
3. Randomly select a number between 1 and 5 (let’s pick 4).
4. Begin with 4, and take every 5th unit.

Probability Sampling Methods:
Cluster
 Cluster

sampling.
 Divide population into clusters and
randomly sample clusters.
 Measure all units within sampled clusters.
 Example: See blue areas on map.
Not just geographic areas,
could select hospitals,
schools etc.


Non-Probability Sampling
Methods


There are different types of non-probability
sampling methods as well:








Convenience (not representative of population).
Purposive (certain group in mind).
Expert sampling (seek out specific expertise).
Snowball sampling (ask people to participate, they
ask more people).

If you select non-probability sampling methods, the
conclusions drawn from the study results apply only
to that specific population.

Measurement: Levels of Data



The level of data will dictate which statistical test you
should use.
Categorical = Data that is classified into categories and
cannot be arranged in any particular order (e.g. Apples
and pears, gender, eye colour, ethnicity).



Ordinal = Data ordered, but distance between intervals
not always equal. (e.g. Low, middle and high income).



Continuous = equal distance between each interval
(e.g. 1,2,3., age).

Statistics and Statistical Test
Selection

Types of Statistics
 Descriptive

Statistics: Describes
 research findings


E.g. Frequencies, averages.

 Inferential

Statistics: Makes inferences
about the population, based on a random
sample.




In a random sample, each person/unit has an
equal chance of being selected
Allows generalizability to population.

Types of Variables


Variables can be classified as independent or
dependent.



An independent variable is the variable you
believe will influence your outcome measure.



A dependent variable is the variable that is
dependant on or influenced by independent
variable(s). The dependent variable can also
be the variable you are trying to predict.

Statistical Test Selection
Selecting the appropriate Statistical test requires
several steps:
Test selection should be based on:
1) What is your goal? Description? Comparison? Prediction? Quantify
association? Prove effectiveness? Prove causality?
2) What kind of data have you collected? What are the levels of data
(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) Is your data normally distributed? Should you use a parametric or nonparametric test?
4) What are the assumptions of the statistical test you would like to
use? Does the data meet these assumptions?

Parametric Tests


Parametric tests assume that the variable in question is
from a normal distribution.



Non-parametric tests do not require the assumption of
normality.



Most non-parametric tests do not require an interval level
of measurement; can be used with nominal/ordinal level
data.

Assumptions







There are various assumptions for each test.
Before you select a test, be sure to check the assumptions of
each test.
You will need to contact a consultant, or review
statistical/research methods resources to find this information.
Some examples of common assumptions are:
 The dependent variable will need to be measured on a
certain level (i.e. Interval level).
 The independent variable(s) will need to be measured on a
certain level (i.e. Ordinal level).
 The population is normally distributed (not skewed).

If your data do not meet the assumptions for a specific test,
you may be able to use a non-parametric test instead.

Type of Data
Goal

Measurement
Normal
Population

Ordinal, or
NonNormal
Population
Median,
interquartil
e range
Wilcoxon test

Binomial
Survival Time
-Two
Possible
Outcomes
Proportion
Kaplan Meier
survival curve

Describe one
group

Mean, SD

Compare one
group to a
hypothetic
al value

One-sample t
test

Compare two
unpaired
groups

Unpaired t test

Mann-Whitney
test

Log-rank test or
Mantel-Haenszel*

Paired t test

Wilcoxon test

Fisher's test
(chisquare
for large
samples)
McNemar's
test

Compare two
paired
groups

Chi-square
test

Cox proportional
hazard
regression**

One-way
Compare
ANOVA
three or
more
unmatched
groups

Kruskal-Wallis
test

Chi-square
or
Binomial
test **

Conditional
proportional
hazards
regression*

Statistical Test Selection Group
Exercise

 Using

your tables, select the
appropriate statistical tests for 10
research scenarios.

During the group exercise…


Steps to choose the appropriate statistical method
for the data analysis:

1. Identify whether the research problem raises the
question of describe, relate (association), or compare
(difference).
2. Identify the levels of measurement in the research
question (Nominal/Categorical, Ordinal/Rank,
Continuous/Evenly spaced).
3. Identify the number of variables, or samples being
described, related, or compared.
4. Identify whether comparison samples are related
(analyze same group before and after) or independent
(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and
situation using the decision tree in the handout.



1. A pilot experiment designed to test the effectiveness of a new
approach to electrode placement for Electro Shock Therapy
(ECT) has been conducted over a one year time period in the
Fraser Health Authority.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 matched pairs of patients were selected
for inclusion in the study. At end of one year, patients were
administered a memory test yielding a total score out of 100.
Dr. Vasdil would like to know what statistical procedure needs
to be selected to test for differences among groups of patients
on the memory test.
What is the question: Compare
How many samples: 2
Related or independent: Independent
What is the level of measurement: Continuous
How many dependent variables: 1
Test: T-test

Sample Size


There are several rules of thumb for determining
sample size.






1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing
groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close
to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent
variable.
3) If you are validating a survey, it is never good to have more questions than cases.
4) If the total population that you are examining is less than 30. Use all of them.
5) For pilot studies the recommendation is a sample size of 12 per group
6) For surveys, a sample size of 400 per group can do just about anything.



7) For surveys, a 30% response rate is the bare minimum.



Note: For a precise sample size estimate you will need to conduct a power analysis.






Statistical Power


Power is the capability of a statistical test to
correctly detect a significant effect if it exists.
 Assumes value between 0 and 1 (%)



Power= 1-B (B= probability of a Type II error).
Type II error – the error of not rejecting a false
research finding.
 Type I error- the error of rejecting a correct
research finding.

Types of Power
 A Priori-

Conducted before study
commences (at proposal stage).
 Post Hoc- After study has been completed.
 Easy




way to increase power?

Increase sample size
Increase Effect size

Components Involved in Power
Calculation



Sample Size- Number of cases.
Effect Size –Magnitude of the trend and
variation.
 Alpha Level- Odds of concluding that the
presence of an effect is due to chance alone (.05
or .01).





Also known as Type I Error, or the error of rejecting a
correct research finding

Power level- 80-90% common
One or two-tailed test- two tailed is common.

Components Involved in Power
Calculation
 Sample

Size- What we want to find out.
 Effect Size –Magnitude of the trend…but
what if you don’t know?



Look to pilot data or literature.
Keep in mind, the smaller the effect size, the
larger the sample size required.

 Alpha

Level- .05
 Power level- 80-90%

Important Consultation Information
 What

is your research question?
 Components of power calculation
 Levels of data (nominal, ordinal,
continuous)
 Sampling plan

Data Organization: Codebook
 What

is a codebook?
 A codebook is a log of your variables
(and levels of data) and how you will
code them.
 A codebook will help everyone
understand the coding schemes to
ensure that they are on the same page!

Data Processing and Analyses:
Codebook Example
Variable
Name

Variable
Label

Values

Coding

Missing

Variable
Type

age

age

1,2,3,4,5

1=10-20 years
2=21-30 years
3=31-40 years
4=41-50 years
5=51+ years

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

sex

sex

1,2

1=male, 2=female

97=Incorrect
response
98=No response
99=Not
Applicable

Nominal

happiness

happiness
at
work

1,2,3

1=not happy
2=somewhat happy
3=very happy

97=Incorrect
response
98=No response
99=Not
Applicable

Ordinal

Spreadsheet Example
ID#

Age

Sex Happiness

1

1

1

2

2

2

2

2

3

3

1

2

4

57

2

2

5

45

2

3

6

66

2

3

7

2

2

3

8

88

2

3

Data Analysis with Excel
 Most

simple analyses can be done using
Excel, including correlation, regression
and even random number generation.
 Install the data analysis pack.


Go to tools, add-ins, and add the ‘analysis
tool pack’.

 Create

worksheet and codebook.
 Choose statistical test.


Follow commands in help menu.

Data Analysis with Excel


http://home.ubalt.edu/ntsbarsh/excel/excel.htm

Data Reporting and Presentation of
Data
 Graphical

summaries are a great way to
present your data
 Excel is great for creating tables and
graphs
 The type of data you have will reflect the
type of graphical summary you should
use.

Data Reporting and Presentation of
Descriptive Data


Categorical data:
Frequency Tables and
Bar Charts.
 Example: Fruit

Count

Percent

Valid
Percent

Pineapples

4

20%

21%

Apples

5

25%

26%

Oranges

10

50%

53%

Unknown

1

5%

_______

Total

20

100%

100%

Data Reporting and Presentation of
Descriptive Data
Fruit Study

Unknown
Oranges
Apples
Pineapples
0

5

10

Data Reporting and Presentation of
Descriptive Data


Continuous Data:
Tables and
Histograms
Age

Count

Percent

20-30

4

20%

31-40

5

25%

41-50

10

50%

51-60

1

5%

Total

20

100%

Data Reporting and Presentation of
Descriptive Data
10
8
20-30
31-40
41-50
51-60

6
4
2
0

20

30

40

50

What is the difference between a
Histogram and a Bar Chart?
 Histogram:

For continuous data where
data are divided into contiguous class
intervals (or in other words, connected
through unbroken sequence).

 Bar

Chart: For categorical data where
categories are not contiguous.

Measures of Central Tendency
 Reporting




 If

averages

Categorical data= Mode
Ordinal data= Median
Continuous data= Mean

there are outliers (or extreme values),
report the median instead of the mean.

Reporting Inferential Stats
 Handout


Resource- APA Guidelines

http://www.ilstu.edu/~jhkahn/apastats.html

Reporting Inferential Stats
 It’s

important to include means, standard
deviations and sample size in your results
section.

 Example:

Correlation
 Variable X was strongly correlated with
Variable Y, r=.59, p<.01.

Important to Keep your
Audience in Mind
Residency Project
Publication
Departmental Report

Aaron: TCPS certification for
residents reminder…

Questions?