7 - INTRODUCTION TO STATISTICS.ppt

Download Report

Transcript 7 - INTRODUCTION TO STATISTICS.ppt

Introduction to biostatistics
By
Dr. S. Shaffi Ahamed
Asst. Professor
Dept. of Family & Community Medicine
KKUH
1
This session covers:
Background and need to know
Biostatistics
 Definition of Statistics and Biostatistics
 Types of data
 Frequency distribution of a data
 Graphical representation of a data

2
What Is Statistics?
1.
Collecting Data
e.g., Sample, Survey, Observe,
Simulate
2.
Characterizing Data
Data
Analysis
Why?
e.g., Organize/Classify, Count,
Summarize
3.
Presenting Data
DecisionMaking
e.g., Tables, Charts,
Statements
4.
Interpreting Results
e.g. Infer, Conclude, Specify
Confidence
© 1984-1994 T/Maker Co.
3

Statistics is the
science of
conducting studies
to collect, organize,
summarize,
analyze, present,
interpret and draw
conclusions from
data.
Any values (observations or
measurements) that have been collected
4
Basis
5
Dynamic
nature of the
Universe
the very continuous change in Nature
brings
- uncertainty
and
- variability
in each and every sphere of the
Universe
6
We by no mean can
control or over-power
the factor of uncertainty but
capable of measuring it
in terms of
Probability
7
Sources of Medical Uncertainties
1.
2.
3.
4.
Intrinsic due to biological,
environmental and sampling factors
Natural variation among methods,
observers, instruments etc.
Errors in measurement or assessment
or errors in knowledge
Incomplete knowledge
8
Biostatistics is the
science that helps in
managing medical
uncertainties
9
“BIOSTATISICS”
(1) Statistics arising out of biological
sciences, particularly from the fields of
Medicine and public health.
 (2) The methods used in dealing with
statistics in the fields of medicine,
biology and public health for planning,
conducting and analyzing data which
arise in investigations of these
branches.

10
CLINICAL MEDICINE
Documentation of medical history of
diseases.
 Planning and conduct of clinical studies.
 Evaluating the merits of different
procedures.
 In providing methods for definition of
“normal” and “abnormal”.

11
PREVENTIVE MEDICINE
To provide the magnitude of any health
problem in the community.
 To find out the basic factors underlying
the ill-health.
 To evaluate the health programs which
was introduced in the community
(success/failure).
 To introduce and promote health
legislation.

12
Role of Biostatics in Health Planning
and Evaluation

In carrying out a valid and reliable
health situation analysis, including in
proper summarization and interpretation
of data.

In proper evaluation of the
achievements and failures of a health
programs.
13
Role of Biostatistics in Medical
Research
In developing a research design that
can minimize the impact of uncertainties
 In assessing reliability and validity of
tools and instruments to collect the
information
 In proper analysis of data

14
BASIC CONCEPTS
Data : Set of values of one or more variables recorded
on one or more observational units (singular: Datum)
Sources of data
1. Routinely kept records
2. Surveys (census)
3. Experiments
4. External source
Categories of data
1. Primary data: observation, questionnaire, record form,
interviews, survey,
2. Secondary data: census, medical record,registry
15
Variables and Types of Data
To gain knowledge about seemingly haphazard events,
statisticians collect information for variables, which
describe the event.
Variables
•is a characteristic or attribute that can assume different values.
•is also a characteristics of interest, one that can be expressed
as a number that possessed by each item under study.
•The value of this characteristics is likely to change or vary from
one item in the data set to the next.
Variables whose values are determined by chance
are called random variables
16
Variables can be
classified
As Quantitative and
Qualitative
By how they are
categorized, counted
or measured
- Level of
measurements of data
17
Nomenclature

Nominal variable:
 Variable
consists of named categories with
no implied order among them.
 Has
cancer or not
 Received treatment or did not
 Is alive or dead
 Is
coded (male = 1, female = 2) but has no
quantitative value.
18
Nomenclature (cont.)

Ordinal variable:
 Variable
consists of ordered categories
and differences between categories are not
equal.
 Patient
status (Improved / Same / Worse)
 Diagnosis (Stage I / Stage II / Stage III)
 Evaluation (Satisfied / neutral / Dissatisfied)
 The
coding now has meaning:
 Improved
= 2, Same = 1, worse = 0
 However, distance between values is not
a constant.
19
Other Nomenclature (cont.)

Interval variable:
 Variable
has equal distances between
values but the zero point is arbitrary.
 IQ
70 to 80 same
as IQ 90 to 100.
 IQ scale could convert 100 to 500 and have
same meaning.
 IQ of 100 is not twice as smart as IQ of 50.


Temperature (37º C -- 36º C; 38º C-- 37º C are equal) and
No implication of ratio (30º C is not twice as hot as 15º C)
20
Other Nomenclature (cont.)

Ratio variable:
 Variable
has equal intervals between
values and a meaningful zero point.
 Height
, Weight
 220 pounds is twice as heavy as 110 pounds.
 Even when converted to kilos, the ratio stays
the same (100 kilos is twice as heavy as 50
kilos).
21
Scales of Measurement
Data
Qualitative
Numerical
Nominal
Ordinal
Quantitative
Nonnumerical
Nominal
Ordinal
Numerical
Interval
Ratio
22
Level of Measurements of Data
Nominal-level
data
Ordinal-level
data
Interval-level
data
Ratio-level
data
classifies data
into mutually
exclusive (non
overlapping),
exhausting
categories in
which no order or
ranking can be
imposed on the
data
classifies data
into categories
that can be
ranked;
however, precise
differences
between the
ranks do not
exist
ranks data, and
precise
differences
between units of
measure do exist;
however, there is
no meaningful
zero
Possesses all the
characteristics of
interval
measurement,
and there exists a
true zero.
Examples
23
Discrete data -- Gaps between possible values
Number of Children
Continuous data -- Theoretically,
no gaps between possible values
Hb
24
CONTINUOUS DATA
QUALITATIVE DATA
wt. (in Kg.) : under wt, normal & over wt.
Ht. (in cm.): short, medium & tall
25
Table 1 Distribution of blunt injured patients
according to hospital length of stay
hospital length of stay
Number
Percent
1 – 3 days
5891
43.3
4 – 7 days
3489
25.6
2 weeks
2449
18.0
3 weeks
813
6.0
1 month
417
3.1
More than 1 month
545
4.0
14604
100.0
Total
Mean = 7.85 SE = 0.10
26
CLINIMETRICS
A science called clinimetrics in which
qualities are converted to meaningful
quantities by using the scoring system.
Examples: (1) Apgar score based on
appearance, pulse, grimace, activity and
respiration is used for neonatal prognosis.
(2) Smoking Index: no. of cigarettes, duration,
filter or not, whether pipe, cigar etc.,
(3) APACHE( Acute Physiology and Chronic
Health Evaluation) score: to quantify the
severity of condition of a patient
27
INVESTIGATION
Data Collection
Data Presentation
Tabulation
Diagrams
Graphs
Descriptive Statistics
Measures of Location
Measures of Dispersion
Measures of Skewness &
Kurtosis
Inferential Statistiscs
Estimation
Hypothesis
Testing
Point estimate
Interval estimate
Univariate analysis
Multivariate analysis
30
An overview of descriptive
statistics and statistical inference
START
Gathering of
Data
Classification,
Summarization, and
Processing of data
Descriptive
Statistics
Presentation and
Communication of
Summarized information
Yes
Is Information from a
sample?
Descripti
ve
Statistics
Yes
Use sample information
to make inferences about
the population
No No
Statistical
Statistical
Inference
Inference
Draw conclusions about
the population
characteristic (parameter)
under study
Use cencus data to
analyze the population
characteristic under study
STOP
31
Descriptive & Inferential Statistics
Descriptive statistics

consists of the collection,
organization, classification,
summarization, and
presentation of data obtain
from the sample.

Used to describe the
characteristics of the sample

Used to determine whether
the sample represent the
target population by
comparing sample statistic
and population parameter
Inferential statistics

consists of generalizing from
samples to populations,
performing estimations
hypothesis testing,
determining relationships
among variables, and making
predictions.

Used when we want to draw a
conclusion for the data obtain
from the sample

Used to describe, infer,
estimate, approximate the
characteristics of the target
population
32
Frequency Distributions
“A Picture is Worth a
Thousand Words”
33
Frequency Distributions

data distribution – pattern of
variability.
 the
center of a distribution
 the ranges
 the shapes
simple frequency distributions
 grouped frequency distributions

34
Simple Frequency Distribution
The number of times that score occurs
 Make a table with highest score at top
and decreasing for every possible whole
number
 N (total number of scores) always
equals the sum of the frequency

 f
=N
35
Example of a simple frequency
distribution

5781593422349714568943521

f
 9
3
 8
2
 7
2
 6
1
 5
4
 4
4
 3
3
 2
3
 1
3

f = 25
36
Relative Frequency Distribution
Proportion of the total N
 Divide the frequency of each score by N
 Rel. f = f/N
 Sum of relative frequencies should
equal 1.0
 Gives us a frame of reference

37
Example of a simple frequency
distribution

5781593422349714568943521

f
rel f
 9
3
.12
 8
2
.08
 7
2
.08
 6
1
.04
 5
4
.16
 4
4
.16
 3
3
.12
 2
3
.12
 1
3
.12

f = 25
 rel f = 1.0
38
Cumulative Frequency
Distributions
cf = cumulative frequency: number of
scores at or below a particular score
 A score’s standing relative to other
scores
 Count from lower scores and add the
simple frequencies for all scores below
that score

39
Example of a simple frequency
distribution

5781593422349714568943521

f
rel f
cf
 9
3
.12
3
 8
2
.08
5
 7
2
.08
7
 6
1
.04
8
 5
4
.16
12
 4
4
.16
16
 3
3
.12
19
 2
3
.12
22
 1
3
.12
25

f = 25
 rel f = 1.0
40
Tabulate the hemoglobin values of 30 adult
male patients listed below
Patien Hb
t No
(g/dl)
1
12.0
2
11.9
3
11.5
4
14.2
5
12.3
6
13.0
7
10.5
8
12.8
9
13.2
10
11.2
Patien Hb
t No
(g/dl)
11
11.2
12
13.6
13
10.8
14
12.3
15
12.3
16
15.7
17
12.6
18
9.1
19
12.9
20
14.6
Patien Hb
t No
(g/dl)
21
14.9
22
12.2
23
12.2
24
11.4
25
10.7
26
12.5
27
11.8
28
15.1
29
13.4
30
13.1
41
Steps for making a table
Step1
Find Minimum (9.1) & Maximum (15.7)
Step2
Calculate difference 15.7 – 9.1 = 6.6
Step3
Decide the number and width of
the classes (7 c.l) 9.0 -9.9, 10.0-10.9,----
Step4
Prepare dummy table –
Hb (g/dl), Tally mark, No. patients
42
DUMMY TABLE
Hb (g/dl)
9.0 – 9.9
10.0 – 10.9
11.0 – 11.9
12.0 – 12.9
13.0 – 13.9
14.0 – 14.9
15.0 – 15.9
Tall marks
No.
patients
Tall Marks TABLE
Hb (g/dl)
Tall marks
No.
patients
9.0 – 9.9
10.0 – 10.9
11.0 – 11.9
l
lll
llll 1
llll llll
1
3
6
10
5
12.0 – 12.9
13.0 – 13.9
14.0 – 14.9
Total
15.0 – 15.9
Total
llll
lll
ll
-
3
2
30
43
Table Frequency distribution of 30 adult male
patients by Hb
Hb (g/dl)
No. of
patients
9.0 – 9.9
1
10.0 – 10.9
3
11.0 – 11.9
6
12.0 – 12.9
10
13.0 – 13.9
5
14.0 – 14.9
3
15.0 – 15.9
2
Total
30
44
Table Frequency distribution of adult patients by
Hb and gender:
Hb
(g/dl)
Gender
Total
Male
Female
<9.0
9.0 – 9.9
10.0 – 10.9
11.0 – 11.9
12.0 – 12.9
13.0 – 13.9
14.0 – 14.9
15.0 – 15.9
0
1
3
6
10
5
3
2
2
3
5
8
6
4
2
0
2
4
8
14
16
9
5
2
Total
30
30
60
45
Elements of a Table
Ideal table should have
Number –
Number
Title
Column headings
Foot-notes
Table number for identification in a report
Title,place Time period
Column Heading
Describe the body of the table, variables,
(What, how classified, where and when)
Variable name, No. , Percentages (%), etc.,
Foot-note(s) - to describe some column/row headings,
46
special cells, source, etc.,
DIAGRAMS/GRAPHS
Qualitative data (Nominal & Ordinal)
--- Bar charts (one or two groups)
Quantitative data (discrete & continuous)
--- Histogram
--- Frequency polygon (curve)
--- Stem-and –leaf plot
--- Box-and-whisker plot
47
Example data
68
79
43
28
49
16
49
30
63
27
25
25
38
24
28
43
42
22
74
45
42
64
23
49
27
28
51
12
27
47
19
12
30
24
36
57
31
23
11
36
25
42
51
50
22
52
28
44
28
12
38
43
46
32
65
31
32
21
27
31
48
Histogram
F r e q u e n cy
20
10
0
1 1 .5
2 1 .5
3 1 .5
4 1 .5
5 1 .5
6 1 .5
7 1 .5
Age
Figure 1 Histogram of ages of 60 subjects
49
Polygon
F r e q u e n cy
20
10
0
1 1 .5
2 1 .5
3 1 .5
4 1 .5
5 1 .5
6 1 .5
7 1 .5
Age
50
Cumulative Frequency Polygon
Cumulative counts can be converted to
percents.
 Shows number cases up to & including
all within the interval.

%
50
#
Common in vital statistics
30
51
Example data
68
79
43
28
49
16
49
30
63
27
25
25
38
24
28
43
42
22
74
45
42
64
23
49
27
28
51
12
27
47
19
12
30
24
36
57
31
23
11
36
25
42
51
50
22
52
28
44
28
12
38
43
46
32
65
31
32
21
27
31
52
Stem and leaf plot
Stem-and-leaf of Age
N = 60
Leaf Unit = 1.0
6
19
1 122269
2 1223344555777788888
(11) 3 00111226688
13
4 2223334567999
5
5 01127
4
6 3458
2
7 49
53
Box plot
80
70
60
Age
50
40
30
20
10
54
Descriptive statistics report: Boxplot
- minimum score
- maximum score
- lower quartile
- upper quartile
- median
- mean
- the skew of the distribution:
positive skew: mean > median & high-score whisker is longer
negative skew: mean < median & low-score whisker is longer
55
Application of a box and Whisker
diagram
56
Pie Chart
•Circular diagram – total -100%
10%
•Divided into segments each
representing a category
20%
Mild
Moderate
Severe
70%
•Decide adjacent category
•The amount for each category is
proportional to slice of the pie
The prevalence of different degree of
Hypertension
in the population
57
Bar Graphs
Number
25
20
15
10
5
20
20
16
12
Heights of the bar indicates
frequency
12
9
8
0
Smo Alc Chol DM HTN No F-H
Exer
Risk factor
Frequency in the Y axis
and categories of variable
in the X axis
The bars should be of equal
width and no touching the
other bars
The distribution of risk factor among cases with
Cardio vascular Diseases
58
HIV cases enrolment in USA
by gender
Enrollment (hundred)
Bar chart
12
10
8
6
Men
Women
4
2
0
1986 1987 1988 1989 1990 1991 1992
Year
59
HIV cases Enrollment
in USA by gender
Stocked bar chart
Enrollment (Thousands)
18
16
14
12
10
8
Women
Men
6
4
2
0
1986 1987 1988 1989 1990 1991 1992
Year
60
Graphic Presentation of Data
the frequency polygon
(quantitative data)
the histogram
(quantitative data)
the bar graph
(qualitative data)
61
62
General rules for designing graphs





A graph should have a self-explanatory
legend
A graph should help reader to understand
data
Axis labeled, units of measurement
indicated
Scales important. Start with zero
(otherwise // break)
Avoid graphs with three-dimensional
impression, it may be misleading (reader
visualize less easily
63
Any Questions
64