introductoin to Biostatistics ( 1st and 2nd lec ).ppt

Download Report

Transcript introductoin to Biostatistics ( 1st and 2nd lec ).ppt

INTRODUCTION TO
BIOSTATISTICS
DR.S.Shaffi Ahamed
Asst. Professor
Dept. of Family and Comm. Medicine
KKUH
This session covers:
 Background and need to know
Biostatistics
 Origin and development of Biostatistics
 Definition of Statistics and Biostatistics
 Types of data
 Graphical representation of a data
 Frequency distribution of a data
 “Statistics is the science which deals
with collection, classification and
tabulation of numerical facts as the
basis for explanation, description
and comparison of phenomenon”.
------ Lovitt
“BIOSTATISICS”
 (1) Statistics arising out of biological
sciences, particularly from the fields of
Medicine and public health.
 (2) The methods used in dealing with
statistics in the fields of medicine, biology
and public health for planning,
conducting and analyzing data which
arise in investigations of these branches.
Origin and development of
statistics in Medical Research
 In 1929 a huge paper on application of
statistics was published in Physiology
Journal by Dunn.
 In 1937, 15 articles on statistical methods
by Austin Bradford Hill, were published in
book form.
 In 1948, a RCT of Streptomycin for
pulmonary tb., was published in which
Bradford Hill has a key influence.
 Then the growth of Statistics in Medicine
from 1952 was a 8-fold increase by 1982.
Douglas Altman
Gauss -
Ronald Fisher
Karl Pearson
C.R. Rao
Basis
Sources of Medical
Uncertainties
1. Intrinsic due to biological,
environmental and sampling factors
2. Natural variation among methods,
observers, instruments etc.
3. Errors in measurement or assessment
or errors in knowledge
4. Incomplete knowledge
Intrinsic variation as a
source of medical
uncertainties
 Biological due to age, gender, heredity, parity, height,
weight, etc. Also due to variation in anatomical,
physiological and biochemical parameters
 Environmental due to nutrition, smoking, pollution,
facilities of water and sanitation, road traffic, legislation,
stress and strains etc.,
 Sampling fluctuations because the entire world cannot
be studied and at least future cases can never be
included
 Chance variation due to unknown or complex to
comprehend factors
Natural variation despite
best care as a source of
uncertainties
 In assessment of any medical parameter
 Due to partial compliance by the patients
 Due to incomplete information in
conditions such as the patient in coma
Medical Errors that cause
Uncertainties
 Carelessness of the providers such as physicians,
surgeons, nursing staff, radiographers and
pharmacists.
 Errors in methods such as in using incorrect quantity or
quality of chemicals and reagents, misinterpretation of
ECG, using inappropriate diagnostic tools,
misrecording of information etc.
 Instrument error due to use of non-standardized or
faulty instrument and improper use of a right
instrument.
 Not collecting full information
 Inconsistent response by the patients or other subjects
under evaluation
Incomplete knowledge as a
source of Uncertainties
 Diagnostic, therapeutic and prognostic
uncertainties due to lack of knowledge
 Predictive uncertainties such as in
survival duration of a patient of cancer
 Other uncertainties such as how to
measure positive health
Biostatistics is the
science that helps in
managing medical
uncertainties
Reasons to know about
biostatistics:
 Medicine is becoming increasingly
quantitative.
 The planning, conduct and interpretation
of much of medical research are
becoming increasingly reliant on the
statistical methodology.
 Statistics pervades the medical literature.
CLINICAL MEDICINE
 Documentation of medical history of
diseases.
 Planning and conduct of clinical studies.
 Evaluating the merits of different
procedures.
 In providing methods for definition of
“normal” and “abnormal”.
Role of Biostatistics in
patient care
 In increasing awareness regarding diagnostic,
therapeutic and prognostic uncertainties and
providing rules of probability to delineate those
uncertainties
 In providing methods to integrate chances with value
judgments that could be most beneficial to patient
 In providing methods such as sensitivity-specificity
and predictivities that help choose valid tests for
patient assessment
 In providing tools such as scoring system and expert
system that can help reduce epistemic uncertainties
PREVENTIVE MEDICINE
 To provide the magnitude of any health
problem in the community.
 To find out the basic factors underlying
the ill-health.
 To evaluate the health programs which
was introduced in the community
(success/failure).
 To introduce and promote health
legislation.
Role of Biostatics in Health
Planning and Evaluation
 In carrying out a valid and reliable health
situation analysis, including in proper
summarization and interpretation of data.
 In proper evaluation of the achievements
and failures of a health programme
Role of Biostatistics in
Medical Research
 In developing a research design that can
minimize the impact of uncertainties
 In assessing reliability and validity of
tools and instruments to collect the
infromation
 In proper analysis of data
Example: Evaluation of Penicillin (treatment
A) vs Penicillin & Chloramphenicol
(treatment B) for treating bacterial
pneumonia in children< 2 yrs.
 What is the sample size needed to demonstrate the significance
of one group against other ?
 Is treatment A is better than treatment B or vice versa ?
 If so, how much better ?
 What is the normal variation in clinical measurement ? (mild,
moderate & severe) ?
 How reliable and valid is the measurement ? (clinical &
radiological) ?
 What is the magnitude and effect of laboratory and technical
error ?
 How does one interpret abnormal values ?
WHAT DOES STAISTICS
COVER ?
Planning
Design
Execution (Data collection)
Data Processing
Data analysis
Presentation
Interpretation
Publication
BASIC CONCEPTS
Data : Set of values of one or more variables recorded
on one or more observational units
Sources of data
1. Routinely kept records
2. Surveys (census)
3. Experiments
4. External source
Categories of data
1. Primary data: observation, questionnaire, record form,
interviews, survey,
2. Secondary data: census, medical record,registry
TYPES OF DATA
 QUALITATIVE DATA
 DISCRETE QUANTITATIVE
 CONTINOUS QUANTITATIVE
QUALITATIVE
Nominal
Example: Sex ( M, F)
Exam result (P, F)
Blood Group (A,B, O or AB)
Color of Eyes (blue, green,
brown, black)
ORDINAL
Example:
Response to treatment
(poor, fair, good)
Severity of disease
(mild, moderate, severe)
Income status (low, middle,
high)
QUANTITATIVE (DISCRETE)
Example: The no. of family members
The no. of heart beats
The no. of admissions in a day
QUANTITATIVE (CONTINOUS)
Example: Height, Weight, Age, BP, Serum
Cholesterol and BMI
Discrete data -- Gaps between possible values
Number of Children
Continuous data -- Theoretically,
no gaps between possible values
Hb
CONTINUOUS DATA
QUALITATIVE DATA
wt. (in Kg.) : under wt, normal & over wt.
Ht. (in cm.): short, medium & tall
Table 1 Distribution of blunt injured patients
according to hospital length of stay
hospital length of stay
Number
Percent
1 – 3 days
5891
43.3
4 – 7 days
3489
25.6
2 weeks
2449
18.0
3 weeks
813
6.0
1 month
417
3.1
More than 1 month
545
4.0
14604
100.0
Total
Mean = 7.85 SE = 0.10
Scale of measurement
Qualitative variable:
A categorical variable
Nominal (classificatory) scale
- gender, marital status, race
Ordinal (ranking) scale
- severity scale, good/better/best
Scale of measurement
Quantitative variable:
A numerical variable: discrete; continuous
Interval scale :
Data is placed in meaningful intervals and order. The unit of
measurement are arbitrary.
- Temperature (37º C -- 36º C; 38º C-- 37º C are equal) and
No implication of ratio (30º C is not twice as hot as 15º C)
Ratio scale:
Data is presented in frequency distribution in
logical order. A meaningful ratio exists.
- Age, weight, height, pulse rate
- pulse rate of 120 is twice as fast as 60
- person with weight of 80kg is twice as heavy
as the one with weight of 40 kg.
Scales of Measure




Nominal – qualitative classification of
equal value: gender, race, color, city
Ordinal - qualitative classification
which can be rank ordered:
socioeconomic status of families
Interval - Numerical or quantitative
data: can be rank ordered and sizes
compared : temperature
Ratio - Quantitative interval data along
with ratio: time, age.
CLINIMETRICS
A science called clinimetrics in which
qualities are converted to meaningful
quantities by using the scoring system.
Examples: (1) Apgar score based on
appearance, pulse, grimace, activity and
respiration is used for neonatal prognosis.
(2) Smoking Index: no. of cigarettes, duration,
filter or not, whether pipe, cigar etc.,
(3) APACHE( Acute Physiology and Chronic
Health Evaluation) score: to quantify the
severity of condition of a patient
INVESTIGATION
Data Colllection
Data Presentation
Tabulation
Diagrams
Graphs
Descriptive Statistics
Measures of Location
Measures of Dispersion
Measures of Skewness &
Kurtosis
Inferential Statistiscs
Estimation
Hypothesis
Testing
Ponit estimate
Inteval estimate
Univariate analysis
Multivariate analysis
Frequency Distributions
 data distribution – pattern of
variability.
 the center of a distribution
 the ranges
 the shapes
 simple frequency distributions
 grouped frequency distributions
 midpoint
Tabulate the hemoglobin values of 30 adult
male patients listed below
Patien Hb
t No
(g/dl)
Patien Hb
t No
(g/dl)
Patien Hb
t No
(g/dl)
1
12.0
11
11.2
21
14.9
2
11.9
12
13.6
22
12.2
3
11.5
13
10.8
23
12.2
4
14.2
14
12.3
24
11.4
5
12.3
15
12.3
25
10.7
6
13.0
16
15.7
26
12.5
7
10.5
17
12.6
27
11.8
8
12.8
18
9.1
28
15.1
9
13.2
19
12.9
29
13.4
10
11.2
20
14.6
30
13.1
Steps for making a
table
Step1
Find Minimum (9.1) & Maximum (15.7)
Step2
Calculate difference 15.7 – 9.1 = 6.6
Step3
Decide the number and width of
the classes (7 c.l) 9.0 -9.9, 10.0-10.9,----
Step4
Prepare dummy table –
Hb (g/dl), Tally mark, No. patients
DUMMY TABLE
Hb (g/dl)
Tall marks
No.
patients
Tall Marks TABLE
Hb (g/dl)
Tall marks
No.
patients
9.0 – 9.9
10.0 – 10.9
11.0 – 11.9
12.0 – 12.9
13.0 – 13.9
14.0 – 14.9
15.0 – 15.9
9.0 – 9.9
10.0 – 10.9
11.0 – 11.9
12.0 – 12.9
13.0 – 13.9
14.0 – 14.9
15.0 – 15.9
l
lll
lll
llll llll
1
3
6
10
5
3
2
Total
Total
-
llll
lll
ll
30
Table Frequency distribution of 30 adult male
patients by Hb
Hb (g/dl)
No. of
patients
9.0 – 9.9
1
10.0 – 10.9
3
11.0 – 11.9
6
12.0 – 12.9
10
13.0 – 13.9
5
14.0 – 14.9
3
15.0 – 15.9
2
Total
30
Table Frequency distribution of adult patients by
Hb and gender:
Hb
(g/dl)
Gender
Total
Male
Female
<9.0
9.0 – 9.9
10.0 – 10.9
11.0 – 11.9
12.0 – 12.9
13.0 – 13.9
14.0 – 14.9
15.0 – 15.9
0
1
3
6
10
5
3
2
2
3
5
8
6
4
2
0
2
4
8
14
16
9
5
2
Total
30
30
60
Elements of a Table
Ideal table should have
Number –
Number
Title
Column headings
Foot-notes
Table number for identification in a report
Title,place Time period
Column Heading
Describe the body of the table, variables,
(What, how classified, where and when)
Variable name, No. , Percentages (%), etc.,
Foot-note(s) - to describe some column/row headings,
special cells, source, etc.,
Table II. Distribution of 120 (Madras) Corporation divisions
according to annual death rate based on registered deaths in
1975 and 1976
Death rate (/1000 per
No.annum)
of divisions
7.0-7.9
4 (3.3)
8.0 - 8.9
13 (10.8)
9.0 - 9.9
20 (16.7)
10.0 - 10.9
27 (22.5)
11.0 - 11.9
18 (15.0)
12.0 - 12.9
11 (0.2)
13.0 - 13.9
11 (9.2)
14.0 - 14.9
6 (5.0)
15.0 - 15.9
2 (1.7)
16.0 - 16.9
4 (3.3)
17.0 - 18.9
3 (2.5)
19.0 +
1 (0.8)
Total
120 (100.0)
Figures in parentheses indicate percentages
DIAGRAMS/GRAPHS
Discrete data
--- Bar charts (one or two groups)
Continuous data
--- Histogram
--- Frequency polygon (curve)
--- Stem-and –leaf plot
--- Box-and-whisker plot
Example data
68
79
43
28
49
16
49
30
63
27
25
25
38
24
28
43
42
22
74
45
42
64
23
49
27
28
51
12
27
47
19
12
30
24
36
57
31
23
11
36
25
42
51
50
22
52
28
44
28
12
38
43
46
32
65
31
32
21
27
31
Histogram
F r e q u e n cy
20
10
0
1 1 .5
2 1 .5
3 1 .5
4 1 .5
5 1 .5
6 1 .5
7 1 .5
Age
Figure 1 Histogram of ages of 60 subjects
Polygon
F r e q u e n cy
20
10
0
1 1 .5
2 1 .5
3 1 .5
4 1 .5
Age
5 1 .5
6 1 .5
7 1 .5
Example data
68
79
43
28
49
16
49
30
63
27
25
25
38
24
28
43
42
22
74
45
42
64
23
49
27
28
51
12
27
47
19
12
30
24
36
57
31
23
11
36
25
42
51
50
22
52
28
44
28
12
38
43
46
32
65
31
32
21
27
31
Stem and leaf plot
Stem-and-leaf of Age
N = 60
Leaf Unit = 1.0
6
19
1 122269
2 1223344555777788888
(11) 3 00111226688
13
4 2223334567999
5
5 01127
4
6 3458
2
7 49
Box plot
80
70
60
Age
50
40
30
20
10
Descriptive statistics report:
Boxplot
- minimum score
- maximum score
- lower quartile
- upper quartile
- median
- mean
- the skew of the distribution:
positive skew: mean > median & high-score whisker is longer
negative skew: mean < median & low-score whisker is longer
Pie Chart
•Circular diagram – total -100%
10%
•Divided into segments each
representing a category
20%
Mild
Moderate
Severe
70%
•Decide adjacent category
•The amount for each category is
proportional to slice of the pie
The prevalence of different degree of
Hypertension
in the population
Bar Graphs
Number
25
20
15
10
5
20
20
16
12
Heights of the bar indicates
frequency
12
9
8
0
Smo Alc Chol DM HTN No F-H
Exer
Risk factor
Frequency in the Y axis
and categories of variable
in the X axis
The bars should be of equal
width and no touching the
other bars
The distribution of risk factor among cases with
Cardio vascular Diseases
HIV cases enrolment in
USA by gender
Enrollment (hundred)
Bar chart
12
10
8
6
Men
Women
4
2
0
1986 1987 1988 1989 1990 1991 1992
Year
HIV cases Enrollment
in USA by gender
Stocked bar chart
Enrollment (Thousands)
18
16
14
12
10
Women
Men
8
6
4
2
0
1986 1987 1988 1989 1990 1991 1992
Year
Graphic Presentation of
Data
the frequency polygon
(quantitative data)
the histogram
(quantitative data)
the bar graph
(qualitative data)
General rules for designing
graphs
 A graph should have a self-explanatory
legend
 A graph should help reader to understand
data
 Axis labeled, units of measurement
indicated
 Scales important. Start with zero (otherwise
// break)
 Avoid graphs with three-dimensional
impression, it may be misleading (reader
visualize less easily
Any Questions
Origin and development of
statistics in Medical Research
 In 1929 a huge paper on application of
statistics was published in Physiology
Journal by Dunn.
 In 1937, 15 articles on statistical methods
by Austin Bradford Hill, were published in
book form.
 In 1948, a RCT of Streptomycin for
pulmonary tb., was published in which
Bradford Hill has a key influence.
 Then the growth of Statistics in Medicine
from 1952 was a 8-fold increase by 1982.