neuron.mefst.hr

Download Report

Transcript neuron.mefst.hr

Ana Jerončić, PhD
Department for Research in Biomedicine and Health
E-mail: [email protected]
Location: main building, 5th floor, room 512
Phone: 557-862
1. Describing data - Central tendency and
variability
2. Estimation - Accuracy, precision, standard
error, confidence intervals
3. Hypothesis testing - Test statistics, P-value,
choice of a statistical test
4. Interpretation of data - Causality and
association, odds ratio, risk, correlation,
linear regression
5. Sources of error - Type 1 and type 2 errors,
power, bias, confounding
Critical appraisal of scientific papers
NOT!
Implementation of data analysis



To identify the best available treatment
To prevent “medical zombies”
To perform your own research
1. How the data should be organized prior to
data analysis
2. Data types
3. Graphical & tabular techniques for
description, summary statistics


Qualitative Data
Quantitative Data
Height measurements among 1st year medical students
157
150
167
159
146
147
146
189
145
141
197
172
160
204
193
159
147
169
166
151
204
161
163
141
143
173
184
205
187
144
198
167
203
189
173
179
146
151
187
186
150
173
204
164
180
171
200
155
195
202
200
172
197
161
146
184
182
169
186
202
203
155
149
197
177
155
169
179
192
165
174
179
147
190
197
197
192
179
169
147
201
165
173
201
152
181
164
151
203
192
188


What is the unit of measurement?
How many observations per subject ?
VARIABLES
O
B
S
E
R
V
A
T
I
O
N
S
Entity
Height
(cm)
Weight
(kg)
Age
Sex
(years) (category)
Person 1
Person 2
Person 3
*
*
176
171
182
*
*
70
60
75
*
*
33
38
62
*
*
Male
Female
Male
*
*
Measurement/
Observation
Variable
Features of
variables
Example
Descriptive
statistics
Informativeness
level
Categorical,
Nominal
Unordered
/unarranged
categories
Gender,
urbanization
Number,
proportion
Low
Orded/arranged
categories
Grades,
scales
Median
Medium
Arranged categories
with equal intervals
Height,
weight
Mean or
median
High
Ordinal
Numerical
Categorical
Nominal
Ordinal
Qualitative
Numerical
Quantitative






Height
Grades
Age in years
Weight
Insuline concentration
Blood glucose
How many cigarettes do you smoke a day?
 1-5
 6-10
 11-15
 16-20
 21 and more
Have you ever had a heart attack?
 Yes
 No
Do you suffer from hypertension?
 Yes
 No
 ?
Gender:
 Male
 Female
Marital status:






married
divorced
widowed
single
lives alone
?
Education:
 elementary school
 high school
 two-year college
 four-year college
 ?

Likert scale

Claim: Violence among the youth is becoming
an increasing problem in Croatia.
I agree completely
1
I agree
2
Undecided
3
I disagree
4
I argue strongly against
5

Visually analogous scale

I don’t
feel pain
E.g. pain level that examinee experiences
I feel
intolerable pain
Numerical
Ordinal
Nominal
Distance is meaningfull
Atributes can be ordered
Attributes are only named; weakest
Person No. Height [cm]
Person 1
148
Person 2
142
Person 3
154
Person 4
153
Person 5
160
Person 6
177
Person 7
204
Person 8
192
Person 9
191
Person 10
203
Person 11
197
Person 12
202
Person 13
177
Person No. Height [cm]
Person 1
148
Person 2
142
Person 3
154
Person 4
153
Person 5
160
Person 6
177
Person 7
204
Person 8
192
Person 9
191
Person 10
203
Person 11
197
Person 12
202
Person 13
177
Organized data are input for
Graphical & Tabular data
representations
In one study researchers investigated genotype of the YPEL5 gene in a
population sample from Split. They got the following results on 10
examinees :
YPEL5
Individual Genotype
1
AA
2
BB
3
BB
4
BB
5
AB
6
AB
7
BB
8
AA
9
AB
10
BB
Table
Frequency Distribution of YPEL5 genotypes
Genotype
Frequency
Relative
Frequency
Relative
Frequency
[%]
AA
2
0.2
20%
AB
3
0.3
30%
BB
5
0.5
50%
10
1.00
100%
Total
proportion
percentage
Counts
Or
Percentages
Frequency
5
4
3
2
1
0
BB
AA
AB
YPEL5 genotype
categories’ names
Bar Charts are often used to display frequencies…
(84%)
(19%)
(16%)
(81%)
(100%)
(100%)



The only allowable calculation => count the frequency
of category.
We can summarize the data in a contingency table that
presents the categories and their counts called a
frequency distribution.
A relative frequency distribution lists the categories
and the proportion with which each occurs.
Nominal data has no order. However, sometimes it is usefull to arrange the
outcomes from the most frequently occurring to the least frequently
occurring. We call this bar chart representation a “pareto chart”
counts
categories’ names
Chart with relative frequency is more informative
percentages
categories’ names
Pie Chart
30%
50%
20%
Pie Charts show relative frequencies…
BB
AA
AB



Authors can use percentages to hide the true size of
the data.
To say that 50% of a sample has a certain condition
when there are only four people in the sample is clearly
not providing the same level of information as 50% of a
sample based on 400 people.
So, percentages should be used as an additional help for
the reader rather than replacing the actual data
Height measurements among 1st year medical students
Individual Height (cm)
1
186
2
144
3
175
4
199
5
149
6
157
7
150
8
176
9
179
10
165
11
151
12
164
13
167
14
175
15
191
16
163
17
187
18
176
19
184
20
191
21
172
22
151
23
179
Frequency distribution for quantitative
data:
Building a Histogram
Category limits
[cm]
Freq.
>140;<=150
3
150-160
3
160-170
4
170-180
7
180-190
5
190-200
1
Total
23
Percent
Relative Relative
Freq. Freq.
0,13
13%
0,13
13%
0,17
17%
0,30
30%
0,22
22%
0,04
4%
1,00
100%
Percent Relative
Frequency
Frequency distribution of height
35%
30%
25%
20%
15%
10%
5%
0%
145 155 165 175 185 195
Height [cm]



There are several graphical methods that are
used when the data are quantitative (
numeric).
The most important of these graphical
methods is the histogram.
The histogram is not only a powerful graphical
technique used to summarize interval data, but
it is also used to help explain probabilities.

http://www.shodor.org/interactivate/activities/Histogram/

Qualitative




Frequency Distribution – tabular summary of data
Bar Chart
Pie Chart
Quantitative




Frequency Distribution – tabular summary of data
Histogram
Line Chart (Time-Series Plot)
Stem and Leaf Display
To compare two variables we use:


Scatter plot/diagram (quantitative)
Cross table (qualitative)

Scatter plot, showing the strong association between enzyme
activity at pH 5.5 and the 5α-reductase 2-specific mRNA
expression, as expressed on the basis of β-actin (n = 30; rs = 0.81;
95% confidence interval, 0.64–0.91; P < 0.0001).
Linearity and Direction are two concepts we are
interested in
Positive Linear Relationship
Negative Linear Relationship
Weak or Non-Linear Relationship
Squamous cell carcinoma tumor and perilesional display distinctly
different scatter plots from normal tissue.
Expresion levels for gene subset 1 in patient 1


Used to compare two qualitative
variables
If first variable has r categories, second
variable c categories, then we have an r×c
cross table.
AA
2
0
2
AB
BB
TOTAL
1
0
3
3
4
7
4
4
10
4
Frequency
YPEL5
Genotype
Disease X
YES NO TOTAL
3
Disease
Healthy
2
1
0
AA
AB
BB
Based on data presented do you think that
YPEL5 could be associated with disease X?
Room 512 (5th floor)
E-mail: [email protected]
The results of measuring the height among med. students
Individual Height (cm)
1
186
2
144
3
175
4
199
5
149
6
157
7
150
8
176
9
179
10
165
11
151
12
164
13
167
14
175
15
191
16
163
17
187
18
176
19
184
20
191
21
172
22
151
23
179
250
200
150
Height
[cm] 100
50
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
subjects
Same data –with reshuffled
subjects
250
200
Height
[cm]
150
100
50
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
subjects