Transcript 7-INTRODUCTION TO BIOSTATISTICS & BASIC CONCEPTS(1432UG).ppt
Introduction to biostatistics
By Dr. S. Shaffi Ahamed
Asst. Professor Dept. of Family & Community Medicine KKUH 1
This session covers:
Background and need to know Biostatistics Definition of Statistics and Biostatistics Types of data Frequency distribution of a data Graphical representation of a data 2
Statistics
is the science of conducting studies to collect, organize, summarize, analyze, present, interpret and draw conclusions from
data
.
Any values (observations or measurements) that have been collected 3
1.
2.
3.
4.
What Is Statistics?
Collecting Data e.g., Sample, Survey, Observe, Simulate Data Analysis Characterizing Data e.g., Organize/Classify, Count, Summarize Presenting Data e.g., Tables, Charts, Statements Interpreting Results e.g. Infer, Conclude, Specify Confidence Why?
Decision Making
4 © 1984-1994 T/Maker Co.
“BIOSTATISICS
”
(1) Statistics arising out of biological sciences, particularly from the fields of Medicine and public health.
(2) The methods used in dealing with statistics in the fields of medicine, biology and public health for planning, conducting and analyzing data which arise in investigations of these branches.
5
CLINICAL MEDICINE
Documentation of medical history of diseases.
Planning and conduct of clinical studies.
Evaluating the merits of different procedures.
In providing methods for definition of “normal” and “abnormal”.
6
PREVENTIVE MEDICINE
To provide the magnitude of any health problem in the community.
To find out the basic factors underlying the ill-health.
To evaluate the health programs which was introduced in the community (success/failure).
To introduce and promote health legislation.
7
Role of Biostatics in Health Planning and Evaluation
In carrying out a valid and reliable health situation analysis, including in proper summarization and interpretation of data.
In proper evaluation of the achievements and failures of a health programs.
8
Role of Biostatistics in Medical Research
In developing a research design that can minimize the impact of uncertainties In assessing reliability and validity of tools and instruments to collect the information In proper analysis of data 9
What can we do with Biostatistics?
Summarize data
Generalize from a sample to a population
Compare groups
Test for relationships
Analyze multiple variables and their relationship to a given outcome
What can we do with Biostatistics?
Assess time to an event as an endpoint
Report the characteristics of a test
Combine results of several studies
Assess costs and consequences of treatment
Report decision analyses and clinical practice guidelines
BASIC CONCEPTS
Data
: Set of values of one or more variables recorded on one or more observational units (singular: Datum)
Sources of data
1. Routinely kept records 2. Surveys (census) 3. Experiments 4. External source
Categories of data
1. Primary data: observation, questionnaire, record form, interviews, survey, 2. Secondary data: census, medical record,registry 12
Variables and Types of Data
To gain knowledge about seemingly haphazard events, statisticians collect information for
variables ,
which describe the event.
Variables
•
is a characteristic or attribute that can assume different values.
•
is also a characteristics of interest, one that can be expressed as a number that possessed by each item under study.
•
The value of this characteristics is likely to change or vary from one item in the data set to the next.
Variables whose values are determined by chance are called
random variables
13
Variables can be classified
As Quantitative and Qualitative By how they are categorized, counted or measured - Level of measurements of data 14
Nomenclature
Nominal variable: Variable consists of named categories with no implied order among them. Has cancer or not Received treatment or did not Is alive or dead Is coded (male = 1, female = 2) but has no quantitative value. 15
Nomenclature (cont.)
Ordinal variable: Variable consists of ordered categories and differences between categories are not equal. Patient status (Improved / Same / Worse) Diagnosis (Stage I / Stage II / Stage III) Evaluation (Satisfied / neutral / Dissatisfied) The coding now has meaning: Improved = 2, Same = 1, worse = 0 However, distant between values is not a constant. 16
Other Nomenclature (cont.)
Interval variable: Variable has equal distances between values but the zero point is arbitrary. IQ 70 to 80 same as IQ 90 to 100.
IQ scale could convert 100 to 500 and have same meaning. Differences between numbers are meaningful but the ratios between them are not. IQ of 100 is not twice as smart as IQ of 50. 17
Other Nomenclature (cont.)
Ratio variable: Variable has equal intervals between values and a meaningful zero point. Height , Weight 220 pounds is twice as heavy as 110 pounds.
Even when converted to kilos, the ratio stays the same (100 kilos is twice as heavy as 50 kilos). 18
Scales of Measurement
Data Numerical Qualitative Nonnumerical Quantitative Numerical Nominal Ordinal Nominal Ordinal Interval Ratio 19
Level of Measurements of Data
Nominal-level data classifies data into mutually exclusive (non overlapping), exhausting categories in which no order or ranking can be imposed on the data Ordinal-level data classifies data into categories that can be ranked; however, precise differences between the ranks do not exist Interval-level data ranks data, and precise differences between units of measure do exist; however, there is no meaningful zero Ratio-level data Possesses all the characteristics of interval measurement, and there exists a true zero. Examples 20
Discrete data -- Gaps between possible values
Number of Children
Continuous data -- Theoretically, no gaps between possible values
Hb 21
CONTINUOUS DATA QUALITATIVE DATA wt. (in Kg.) : under wt, normal & over wt.
Ht. (in cm.): short, medium & tall 22
Table 1 Distribution of blunt injured patients according to hospital length of stay hospital length of stay Number Percent
1 – 3 days 4 – 7 days 5891 3489 43.3
25.6
2 weeks 3 weeks 1 month More than 1 month
Total Mean = 7.85 SE = 0.10
2449 813 417 545
14604
18.0
6.0
3.1
4.0
100.0
23
CLINIMETRICS
A science called clinimetrics in which qualities are converted to meaningful quantities by using the scoring system.
Examples
: (1) Apgar score based on appearance, pulse, grimace, activity and respiration is used for neonatal prognosis.
(2) Smoking Index: no. of cigarettes, duration, filter or not, whether pipe, cigar etc., (3) APACHE( Acute Physiology and Chronic Health Evaluation) score: to quantify the severity of condition of a patient 24
INVESTIGATION
Data Presentation
Tabulation Diagrams Graphs
Data Collection Descriptive Statistics
Measures of Location Measures of Dispersion Measures of Skewness & Kurtosis
Inferential Statistiscs
Estimation Hypothesis Testing Point estimate Interval estimate
Univariate analysis Multivariate analysis
25
Frequency Distributions “
A Picture is Worth a Thousand Words
”
26
Frequency Distributions
data distribution – pattern of variability.
the center of a distribution the ranges the shapes simple frequency distributions grouped frequency distributions 27
Simple Frequency Distribution
The number of times that score occurs Make a table with highest score at top and decreasing for every possible whole number N (total number of scores) always equals the sum of the frequency
f =
N 28
Example of a simple frequency distribution
5 4 3 2 1 8 7 6 5 7 8 1 5 9 3 4 2 2 3 4 9 7 1 4 5 6 8 9 4 3 5 2 1
f
9 3 2 2 1 4 4 3 3 3
f = 25
29
Relative Frequency Distribution
Proportion of the total N Divide the frequency of each score by N Rel.
f
=
f
/N Sum of relative frequencies should equal 1.0
Gives us a frame of reference 30
Example of a simple frequency distribution
3 2 1 7 6 5 4 5 7 8 1 5 9 3 4 2 2 3 4 9 7 1 4 5 6 8 9 4 3 5 2 1 9 8
f rel f
3 .12
2 .08
2 .08
1 .04
4 .16
4 .16
3 .12
3 .12
3 .12
f = 25
rel
f = 1.0
31
Cumulative Frequency Distributions
cf = cumulative frequency: number of scores at or below a particular score
A score’s standing relative to other scores Count from lower scores and add the simple frequencies for all scores below that score 32
Example of a simple frequency distribution
3 2 1 7 6 5 4 5 7 8 1 5 9 3 4 2 2 3 4 9 7 1 4 5 6 8 9 4 3 5 2 1 9 8
f rel f cf
3 .12 3 2 .08 5 2 .08 7 1 .04 8 4 .16 12 4 .16 16 3 .12 19 3 .12 22 3 .12 25
f = 25
rel
f = 1.0
33
Tabulate the hemoglobin values of 30 adult male patients listed below
Patien t No 1 2 3 4 5 6 7 8 9 10 Hb (g/dl) 12.0
11.9
11.5
14.2
12.3
13.0
10.5
12.8
13.2
11.2
Patien t No 11 12 13 14 15 16 17 18 19 20 Hb (g/dl) 11.2
13.6
10.8
12.3
12.3
15.7
12.6
9.1
12.9
14.6
Patien t No 21 22 23 24 25 26 27 28 29 30 Hb (g/dl) 14.9
12.2
12.2
11.4
10.7
12.5
11.8
15.1
13.4
13.1
34
Steps for making a table
Step1 Find Minimum (9.1) & Maximum (15.7) Step2 Calculate difference 15.7 – 9.1 = 6.6 Step3 Decide the number and width of the classes (7 c.l) 9.0 -9.9, 10.0-10.9,--- Step4 Prepare dummy table – Hb (g/dl), Tally mark, No. patients
35
Hb (g/dl)
DUMMY TABLE
Tall marks No. patients
9.0 – 9.9
10.0 – 10.9
11.0 – 11.9
12.0 – 12.9
13.0 – 13.9
14.0 – 14.9
15.0 – 15.9
Total Hb (g/dl)
Tall Marks TABLE
Tall marks No. patients
9.0 – 9.9
10.0 – 10.9
11.0 – 11.9
12.0 – 12.9
13.0 – 13.9
14.0 – 14.9
15.0 –
Total
15.9
l lll llll 1 llll llll llll lll ll 1 3 6 10 5 3 2 30 36
Table Frequency distribution of 30 adult male patients by Hb Hb (g/dl) No. of patients 9.0 – 9.9
10.0 – 10.9
11.0 – 11.9
12.0 – 12.9
13.0 – 13.9
14.0 – 14.9
15.0 – 15.9
Total 1 3 6 10 5 3 2 30
37
Table Frequency distribution of adult patients by Hb and gender: Hb (g/dl) Gender Total Male Female <9.0
9.0 – 9.9
10.0 – 10.9
11.0 – 11.9
12.0 – 12.9
13.0 – 13.9
14.0 – 14.9
15.0 – 15.9
Total 0 1 3 6 10 5 3 2 30 2 3 5 8 6 4 2 0 30 2 4 8 14 16 9 5 2 60
38
Elements of a Table
Ideal table should have Number Title Column headings Foot-notes Number – Table number for identification in a report Title,place Describe the body of the table, variables, Time period (What, how classified, where and when) Column Heading Variable name, No. , Percentages (%), etc., Foot-note(s) - to describe some column/row headings, special cells, source, etc., 39
DIAGRAMS/GRAPHS
Qualitative data (Nominal & Ordinal) --- Bar charts (one or two groups) Quantitative data (discrete & continuous) --- Histogram --- Frequency polygon (curve) --- Stem-and –leaf plot --- Box-and-whisker plot 40
Example data
68 79 43 28 49 16 49 30 63 27 25 25 38 24 28 43 42 22 74 45 42 64 23 49 27 28 51 12 27 47 19 12 30 24 36 57 31 23 11 36 25 42 51 50 22 52 28 44 28 12 38 43 46 32 65 31 32 21 27 31 41
Histogram
20 10 0 11.5
21.5
31.5
41.5
Age 51.5
61.5
71.5
Figure 1 Histogram of ages of 60 subjects
42
Polygon
20 10 0 11.5
21.5
31.5
41.5
Age 51.5
61.5
71.5
43
Cumulative Frequency Polygon
Cumulative counts can be converted to percents. Shows number cases up to & including all within the interval.
% 50 # 30 Common in vital statistics 44
Example data
68 79 43 28 49 16 49 30 63 27 25 25 38 24 28 43 42 22 74 45 42 64 23 49 27 28 51 12 27 47 19 12 30 24 36 57 31 23 11 36 25 42 51 50 22 52 28 44 28 12 38 43 46 32 65 31 32 21 27 31 45
Stem and leaf plot
Stem-and-leaf of Age N = 60 Leaf Unit = 1.0
6 1 122269 19 2 1223344555777788888 (11) 3 00111226688 13 4 2223334567999 5 5 01127 4 6 3458 2 7 49
46
Box plot
40 30 20 10 80 70 60 50 47
Descriptive statistics report: Boxplot
- minimum score - maximum score - lower quartile - upper quartile - median - mean - the skew of the distribution: positive skew: mean > median & high-score whisker is longer negative skew: mean < median & low-score whisker is longer 48
Application of a box and Whisker diagram
49
70% 10% 20%
Pie Chart
•
Circular diagram – total -100%
Mild Moderate Severe •
Divided into segments each representing a category
•
Decide adjacent category
•
The amount for each category is proportional to slice of the pie The prevalence of different degree of Hypertension in the population
50
Bar Graphs
25 20 15 10 5 0 20 20 Heights of the bar indicates frequency 16 12 12 9 8 Frequency in the Y axis and categories of variable in the X axis Smo Alc Chol DM HTN No Exer
Risk factor
F-H The bars should be of equal width and no touching the other bars The distribution of risk factor among cases with Cardio vascular Diseases 51
HIV cases enrolment in USA by gender
Bar chart
12 10 8 2 0 6 4 Men Women 1986 1987 1988 1989 1990 Year 1991 1992
52
18 16 14 12 10 8 6 4 2 0
HIV cases Enrollment in USA by gender
Stocked bar chart
Women Men 1986 1987 1988 1989 1990 1991 1992 Year
53
Graphic Presentation of Data
the frequency polygon (quantitative data) the histogram (quantitative data) the bar graph (qualitative data) 54
55
General rules for designing graphs
A graph should have a self-explanatory legend A graph should help reader to understand data Axis labeled, units of measurement indicated Scales important. Start with zero (otherwise // break) Avoid graphs with three-dimensional impression, it may be misleading (reader visualize less easily
56
Any Questions
57