7-INTRODUCTION TO BIOSTATISTICS & BASIC CONCEPTS(1432UG).ppt

Download Report

Transcript 7-INTRODUCTION TO BIOSTATISTICS & BASIC CONCEPTS(1432UG).ppt

Introduction to biostatistics

By Dr. S. Shaffi Ahamed

Asst. Professor Dept. of Family & Community Medicine KKUH 1

This session covers:

 Background and need to know Biostatistics   Definition of Statistics and Biostatistics Types of data   Frequency distribution of a data Graphical representation of a data 2

Statistics

is the science of conducting studies to collect, organize, summarize, analyze, present, interpret and draw conclusions from

data

.

Any values (observations or measurements) that have been collected 3

1.

2.

3.

4.

What Is Statistics?

Collecting Data e.g., Sample, Survey, Observe, Simulate Data Analysis Characterizing Data e.g., Organize/Classify, Count, Summarize Presenting Data e.g., Tables, Charts, Statements Interpreting Results e.g. Infer, Conclude, Specify Confidence Why?

Decision Making

4 © 1984-1994 T/Maker Co.

“BIOSTATISICS

 (1) Statistics arising out of biological sciences, particularly from the fields of Medicine and public health.

 (2) The methods used in dealing with statistics in the fields of medicine, biology and public health for planning, conducting and analyzing data which arise in investigations of these branches.

5

CLINICAL MEDICINE

 Documentation of medical history of diseases.

 Planning and conduct of clinical studies.

 Evaluating the merits of different procedures.

 In providing methods for definition of “normal” and “abnormal”.

6

PREVENTIVE MEDICINE

 To provide the magnitude of any health problem in the community.

 To find out the basic factors underlying the ill-health.

 To evaluate the health programs which was introduced in the community (success/failure).

 To introduce and promote health legislation.

7

Role of Biostatics in Health Planning and Evaluation

 In carrying out a valid and reliable health situation analysis, including in proper summarization and interpretation of data.

 In proper evaluation of the achievements and failures of a health programs.

8

Role of Biostatistics in Medical Research

 In developing a research design that can minimize the impact of uncertainties  In assessing reliability and validity of tools and instruments to collect the information  In proper analysis of data 9

What can we do with Biostatistics?

Summarize data

Generalize from a sample to a population

Compare groups

Test for relationships

Analyze multiple variables and their relationship to a given outcome

What can we do with Biostatistics?

Assess time to an event as an endpoint

Report the characteristics of a test

Combine results of several studies

Assess costs and consequences of treatment

Report decision analyses and clinical practice guidelines

BASIC CONCEPTS

Data

: Set of values of one or more variables recorded on one or more observational units (singular: Datum)

Sources of data

1. Routinely kept records 2. Surveys (census) 3. Experiments 4. External source

Categories of data

1. Primary data: observation, questionnaire, record form, interviews, survey, 2. Secondary data: census, medical record,registry 12

Variables and Types of Data

To gain knowledge about seemingly haphazard events, statisticians collect information for

variables ,

which describe the event.

Variables

is a characteristic or attribute that can assume different values.

is also a characteristics of interest, one that can be expressed as a number that possessed by each item under study.

The value of this characteristics is likely to change or vary from one item in the data set to the next.

Variables whose values are determined by chance are called

random variables

13

Variables can be classified

As Quantitative and Qualitative By how they are categorized, counted or measured - Level of measurements of data 14

Nomenclature

 Nominal variable:  Variable consists of named categories with no implied order among them.  Has cancer or not  Received treatment or did not  Is alive or dead  Is coded (male = 1, female = 2) but has no quantitative value. 15

Nomenclature (cont.)

 Ordinal variable:  Variable consists of ordered categories and differences between categories are not equal.  Patient status (Improved / Same / Worse)  Diagnosis (Stage I / Stage II / Stage III)  Evaluation (Satisfied / neutral / Dissatisfied)  The coding now has meaning:  Improved = 2, Same = 1, worse = 0  However, distant between values is not a constant. 16

Other Nomenclature (cont.)

 Interval variable:  Variable has equal distances between values but the zero point is arbitrary.  IQ 70 to 80 same as IQ 90 to 100.

 IQ scale could convert 100 to 500 and have same meaning.  Differences between numbers are meaningful but the ratios between them are not.  IQ of 100 is not twice as smart as IQ of 50. 17

Other Nomenclature (cont.)

 Ratio variable:  Variable has equal intervals between values and a meaningful zero point.  Height , Weight  220 pounds is twice as heavy as 110 pounds.

 Even when converted to kilos, the ratio stays the same (100 kilos is twice as heavy as 50 kilos). 18

Scales of Measurement

Data Numerical Qualitative Nonnumerical Quantitative Numerical Nominal Ordinal Nominal Ordinal Interval Ratio 19

Level of Measurements of Data

Nominal-level data classifies data into mutually exclusive (non overlapping), exhausting categories in which no order or ranking can be imposed on the data Ordinal-level data classifies data into categories that can be ranked; however, precise differences between the ranks do not exist Interval-level data ranks data, and precise differences between units of measure do exist; however, there is no meaningful zero Ratio-level data Possesses all the characteristics of interval measurement, and there exists a true zero. Examples 20

Discrete data -- Gaps between possible values

Number of Children

Continuous data -- Theoretically, no gaps between possible values

Hb 21

CONTINUOUS DATA QUALITATIVE DATA wt. (in Kg.) : under wt, normal & over wt.

Ht. (in cm.): short, medium & tall 22

Table 1 Distribution of blunt injured patients according to hospital length of stay hospital length of stay Number Percent

1 – 3 days 4 – 7 days 5891 3489 43.3

25.6

2 weeks 3 weeks 1 month More than 1 month

Total Mean = 7.85 SE = 0.10

2449 813 417 545

14604

18.0

6.0

3.1

4.0

100.0

23

CLINIMETRICS

A science called clinimetrics in which qualities are converted to meaningful quantities by using the scoring system.

Examples

: (1) Apgar score based on appearance, pulse, grimace, activity and respiration is used for neonatal prognosis.

(2) Smoking Index: no. of cigarettes, duration, filter or not, whether pipe, cigar etc., (3) APACHE( Acute Physiology and Chronic Health Evaluation) score: to quantify the severity of condition of a patient 24

INVESTIGATION

Data Presentation

Tabulation Diagrams Graphs

Data Collection Descriptive Statistics

Measures of Location Measures of Dispersion Measures of Skewness & Kurtosis

Inferential Statistiscs

Estimation Hypothesis Testing Point estimate Interval estimate

Univariate analysis Multivariate analysis

25

Frequency Distributions “

A Picture is Worth a Thousand Words

26

Frequency Distributions

 data distribution – pattern of variability.

 the center of a distribution  the ranges  the shapes  simple frequency distributions  grouped frequency distributions 27

Simple Frequency Distribution

 The number of times that score occurs  Make a table with highest score at top and decreasing for every possible whole number  N (total number of scores) always equals the sum of the frequency  

f =

N 28

Example of a simple frequency distribution

            5 4 3 2 1 8 7 6 5 7 8 1 5 9 3 4 2 2 3 4 9 7 1 4 5 6 8 9 4 3 5 2 1

f

9 3 2 2 1 4 4 3 3 3 

f = 25

29

Relative Frequency Distribution

 Proportion of the total N  Divide the frequency of each score by N  Rel.

f

=

f

/N  Sum of relative frequencies should equal 1.0

 Gives us a frame of reference 30

Example of a simple frequency distribution

            3 2 1 7 6 5 4 5 7 8 1 5 9 3 4 2 2 3 4 9 7 1 4 5 6 8 9 4 3 5 2 1 9 8

f rel f

3 .12

2 .08

2 .08

1 .04

4 .16

4 .16

3 .12

3 .12

3 .12

f = 25

 rel

f = 1.0

31

Cumulative Frequency Distributions

 

cf = cumulative frequency: number of scores at or below a particular score

A score’s standing relative to other scores  Count from lower scores and add the simple frequencies for all scores below that score 32

Example of a simple frequency distribution

            3 2 1 7 6 5 4 5 7 8 1 5 9 3 4 2 2 3 4 9 7 1 4 5 6 8 9 4 3 5 2 1 9 8

f rel f cf

3 .12 3 2 .08 5 2 .08 7 1 .04 8 4 .16 12 4 .16 16 3 .12 19 3 .12 22 3 .12 25 

f = 25

 rel

f = 1.0

33

Tabulate the hemoglobin values of 30 adult male patients listed below

Patien t No 1 2 3 4 5 6 7 8 9 10 Hb (g/dl) 12.0

11.9

11.5

14.2

12.3

13.0

10.5

12.8

13.2

11.2

Patien t No 11 12 13 14 15 16 17 18 19 20 Hb (g/dl) 11.2

13.6

10.8

12.3

12.3

15.7

12.6

9.1

12.9

14.6

Patien t No 21 22 23 24 25 26 27 28 29 30 Hb (g/dl) 14.9

12.2

12.2

11.4

10.7

12.5

11.8

15.1

13.4

13.1

34

Steps for making a table

Step1 Find Minimum (9.1) & Maximum (15.7) Step2 Calculate difference 15.7 – 9.1 = 6.6 Step3 Decide the number and width of the classes (7 c.l) 9.0 -9.9, 10.0-10.9,--- Step4 Prepare dummy table – Hb (g/dl), Tally mark, No. patients

35

Hb (g/dl)

DUMMY TABLE

Tall marks No. patients

9.0 – 9.9

10.0 – 10.9

11.0 – 11.9

12.0 – 12.9

13.0 – 13.9

14.0 – 14.9

15.0 – 15.9

Total Hb (g/dl)

Tall Marks TABLE

Tall marks No. patients

9.0 – 9.9

10.0 – 10.9

11.0 – 11.9

12.0 – 12.9

13.0 – 13.9

14.0 – 14.9

15.0 –

Total

15.9

l lll llll 1 llll llll llll lll ll 1 3 6 10 5 3 2 30 36

Table Frequency distribution of 30 adult male patients by Hb Hb (g/dl) No. of patients 9.0 – 9.9

10.0 – 10.9

11.0 – 11.9

12.0 – 12.9

13.0 – 13.9

14.0 – 14.9

15.0 – 15.9

Total 1 3 6 10 5 3 2 30

37

Table Frequency distribution of adult patients by Hb and gender: Hb (g/dl) Gender Total Male Female <9.0

9.0 – 9.9

10.0 – 10.9

11.0 – 11.9

12.0 – 12.9

13.0 – 13.9

14.0 – 14.9

15.0 – 15.9

Total 0 1 3 6 10 5 3 2 30 2 3 5 8 6 4 2 0 30 2 4 8 14 16 9 5 2 60

38

Elements of a Table

Ideal table should have Number Title Column headings Foot-notes Number – Table number for identification in a report Title,place Describe the body of the table, variables, Time period (What, how classified, where and when) Column Heading Variable name, No. , Percentages (%), etc., Foot-note(s) - to describe some column/row headings, special cells, source, etc., 39

DIAGRAMS/GRAPHS

Qualitative data (Nominal & Ordinal) --- Bar charts (one or two groups) Quantitative data (discrete & continuous) --- Histogram --- Frequency polygon (curve) --- Stem-and –leaf plot --- Box-and-whisker plot 40

Example data

68 79 43 28 49 16 49 30 63 27 25 25 38 24 28 43 42 22 74 45 42 64 23 49 27 28 51 12 27 47 19 12 30 24 36 57 31 23 11 36 25 42 51 50 22 52 28 44 28 12 38 43 46 32 65 31 32 21 27 31 41

Histogram

20 10 0 11.5

21.5

31.5

41.5

Age 51.5

61.5

71.5

Figure 1 Histogram of ages of 60 subjects

42

Polygon

20 10 0 11.5

21.5

31.5

41.5

Age 51.5

61.5

71.5

43

Cumulative Frequency Polygon

 Cumulative counts can be converted to percents.  Shows number cases up to & including all within the interval.

% 50 # 30 Common in vital statistics 44

Example data

68 79 43 28 49 16 49 30 63 27 25 25 38 24 28 43 42 22 74 45 42 64 23 49 27 28 51 12 27 47 19 12 30 24 36 57 31 23 11 36 25 42 51 50 22 52 28 44 28 12 38 43 46 32 65 31 32 21 27 31 45

Stem and leaf plot

Stem-and-leaf of Age N = 60 Leaf Unit = 1.0

6 1 122269 19 2 1223344555777788888 (11) 3 00111226688 13 4 2223334567999 5 5 01127 4 6 3458 2 7 49

46

Box plot

40 30 20 10 80 70 60 50 47

Descriptive statistics report: Boxplot

- minimum score - maximum score - lower quartile - upper quartile - median - mean - the skew of the distribution: positive skew: mean > median & high-score whisker is longer negative skew: mean < median & low-score whisker is longer 48

Application of a box and Whisker diagram

49

70% 10% 20%

Pie Chart

Circular diagram – total -100%

Mild Moderate Severe •

Divided into segments each representing a category

Decide adjacent category

The amount for each category is proportional to slice of the pie The prevalence of different degree of Hypertension in the population

50

Bar Graphs

25 20 15 10 5 0 20 20 Heights of the bar indicates frequency 16 12 12 9 8 Frequency in the Y axis and categories of variable in the X axis Smo Alc Chol DM HTN No Exer

Risk factor

F-H The bars should be of equal width and no touching the other bars The distribution of risk factor among cases with Cardio vascular Diseases 51

HIV cases enrolment in USA by gender

Bar chart

12 10 8 2 0 6 4 Men Women 1986 1987 1988 1989 1990 Year 1991 1992

52

18 16 14 12 10 8 6 4 2 0

HIV cases Enrollment in USA by gender

Stocked bar chart

Women Men 1986 1987 1988 1989 1990 1991 1992 Year

53

Graphic Presentation of Data

the frequency polygon (quantitative data) the histogram (quantitative data) the bar graph (qualitative data) 54

55

General rules for designing graphs

    

A graph should have a self-explanatory legend A graph should help reader to understand data Axis labeled, units of measurement indicated Scales important. Start with zero (otherwise // break) Avoid graphs with three-dimensional impression, it may be misleading (reader visualize less easily

56

Any Questions

57