Biostatistics 200ab Lecture 2 23 September 1999

Download Report

Transcript Biostatistics 200ab Lecture 2 23 September 1999

Introduction to
Biostatistics
(BIO/EPI 540)
Data Presentation
Graphs and Tables
Acknowledgement: Thanks to Professor Pagano
(Harvard School of Public Health) for lecture material 1
Class Plan
Data Presentation (Lec 2 overview)
Example (hand/SAS)
Mean and variance
Describing Data (and in next class)
Simulating Data (and in next class)
2
Outline
• Descriptive Statistics – means of
organizing and summarizing
observations
• Types of data
• Data presentation and numerical
summary measures
3
Types of data
•Nominal Data
•Ordinal Data
Rank Data
•Discrete Data
•Continuous Data
4
Types of data
•Nominal Data
1: male
0:female
•Nominal data values fall
into unordered categories
or classes
5
Types of data
•Ordinal Data
1. Mild
2. Moderate
3. Severe
•Observations with order among
categories are referred to as
ordinal
6
7
Example: Death of Manatees in Florida
Cause
1999
Floodgates/Canal Lock
1998
15
9
8
6
Natural
43
21
Perinatal
52
53
Watercraft
82
66
Undetermined
69
76
263
231
Human Related
Total
Nominal categories
Florida Fish and Wildlife Conservation Commission
8
Example: Death of Manatees in Florida
Cause
Floodgates/Canal Lock
1999
1998
15
9
8
6
Natural
43
21
Perinatal
52
53
Watercraft
82
66
Undetermined
69
76
263
231
Human Related
Total
Rank
4
5
3
2
1
Ranked data
Florida Fish and Wildlife Conservation Commission
9
Types of data
•Discrete Data
e.g. Data on number of
children per subject
•Both order &
magnitude
important
•Data consists of
restricted set of
values
Subject
Number of
children
1
2
2
3
3
1
4
2
5
4
10
Types of data
•Continuous Data
•US adult heights
•US adult individual
cholesterol measurements
•Data represents
measurable quantities, but
are not restricted to
taking on specific values
11
Outline
• Descriptive Statistics – means of
organizing and summarizing
observations
• Types of data
• Data presentation and numerical
summary measures
12
Data Presentation
• Nominal / Ordinal Data:
– Frequency (relative frequency) tables
– Bar charts
• Discrete/ Continuous Data:
– Histogram (Frequency Polygon)
– One way scatter plot
• Continuous Data:
– Box plot
– 2 way scatter plot
– Line Graph
13
Frequency Table
Example: Serum cholesterol level of men aged
25-34 years.
Cholesterol Number
Level
of
(mg/100 ml)
Men
80—119
13
120—159
150
160—199
442
200—239
299
240—279
115
280—319
34
320—359
9
360—399
5
Total
1,067
14
Frequency Table
Example: Serum cholesterol level of men aged
25-34 years.
Cholesterol Number
Relative
Level
of
Frequency (%)
(mg/100 ml)
Men
80—119
13
1.2
120—159
150
14.1
160—199
442
41.4
200—239
299
28.0
240—279
115
10.8
280—319
34
3.2
320—359
9
0.8
360—399
5
0.5
Total
1,067
100.0
15
Bar Chart
Car defects in three factories
Label axes;
Leave space between bars
http://www.ncsu.edu/labwrite/res/gh/gh-bargraph.html#horizbar
16
Data Presentation
• Nominal / Ordinal Data:
– Frequency (relative frequency) tables
– Bar charts
• Discrete/ Continuous Data:
– Histogram (Frequency Polygon)
• Continuous Data:
– Box plot
– 2 way scatter plot
– Line Graph
17
Histogram
Example
18
Histogram
Key points
• Choosing the number of bins –
depends on range of data
• Equal widths of bins recommended
• When data demands unequal bin
widths, take care to plot area
proportional to relative frequency
19
Histogram
Key points
• A histogram represents percentages by
areas*
• Density scale (Y axis): the height of each
block (bin) equals the percentage in that
block (bin) divided by the bin width
• Total area of histogram = 100%
• When bin widths are equal – it is common
for the histogram to show just the counts
in each bin
Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf
20
Histogram - example
Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf
21
Histogram - example
Percent
Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf
22
Histogram
Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf
23
Histogram
Constructing a 100% area
histogram
Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf
24
Histogram
Constructing a 100% area
histogram
Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf
25
Histogram
Constructing a 100% area
histogram
density
-2.0
-0.4
0
0.4
2.0
26
Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf
Frequency Polygon - Example
Serum cholesterol level of men (1976-1980 survey)
Cholesterol
Relative
Relative
Level
Frequency
Frequency
(mg/100 ml) 25-34 yrs (%) 55-64 yrs (%)
80—119
1.2
0.4
120—159
14.1
3.9
160—199
41.4
21.6
200—239
28.0
37.3
240—279
10.8
22.9
280—319
3.2
10.4
320—359
0.8
2.9
360—399
0.5
0.6
100.0
100.0
Total
27
Percent
Frequency Polygon - Example
45
40
35
30
25
20
15
10
5
0
Frequency polygon of cholesterol
80119
120159
160199
200- 240- 280- 320- 360239 279 319 359 399
Levels
25-34
55-64
28
Frequency Polygon - Example
Serum choleterol level of men aged 25-34 years.
Cholesterol
Level
(mg/100 ml)
Relative
Frequency (%)
Cumulative
80—119
1.2
120—159
14.1
1.2
15.3
160—199
41.4
200—239
28.0
240—279
10.8
280—319
3.2
320—359
0.8
360—399
0.5
Total
100.0
56.7
84.7
95.5
98.7
99.5
100.0
29
Frequency Polygon - Example
Cumulative frequency polygon of cholesterol
100
Percent
80
60
40
20
0
80119
120159
160- 200- 240- 280- 320- 360199 239 279 319 359 399
Levels
25-34
30
Frequency Polygon - Example
Cumulative frequency polygon of cholesterol
100
Percent
80
60
40
20
0
80119
120159
160- 200- 240- 280- 320- 360199 239 279 319 359 399
Levels
25-34
55-64
31
Data Presentation
• Nominal / Ordinal Data:
– Frequency (relative frequency) tables
– Bar charts
• Discrete/ Continuous Data:
– Histogram (Frequency Polygon)
• Continuous Data:
– Box plot
– 2 way scatter plot
– Line Graph
32
Example - Dyslipidemia in HIV Cohort
Histogram reveals an asymmetric,
skewed distribution
33
Example - Dyslipidemia in HIV Cohort
Natural log transformation of the data
results in a more symmetric distribution
34
Box plot
Dyslipidemia in HIV Cohort
Natural log transformed
Triglyceride measurements
Outliers
UB
75th percentile
50th percentile
25th percentile
LB
UB (LB) = most extreme data point that is within 1.5 times
box width (IQR) of the 75th (25th) percentile
35
Box plot
Dyslipidemia in HIV Cohort
36
2 way scatter plot
Dyslipidemia in HIV Cohort
Reveals relationship between 2 continuous variables
37
• Data Types:
–
–
–
–
Summary
Nominal
Ordinal
Discrete
Continuous
• Data presentation (Nominal/Ordinal data):
– Tables (Frequency, Relative Frequency)
– Bar charts
• Data presentation (Discrete/Continuous)
– Histogram (Frequency Polygon)
• Data presentation (Continuous)
– Box plot, shapes of distributions
– 2 way scatter plot
38
In-Class Example
Distance willing to Travel to a Household
Hazardous waste site:
Distance
Freq
< 1 mile
75
1>-2 miles
90
2>-5 miles
45
5>-10 miles 90
300
Histogram, Polygon, Cum % Dist.
39
In-Class Example
Distance willing to Travel to a Household
Hazardous waste site:
Distance
Freq
%
%/mile
< 1 mile
75
25 25
>1-2 miles
90
30 30
>2-5 miles
45
15
5
>5-10 miles 90
30
6
300
Histogram, Polygon, Cum % Dist.
40
Density
Histogram of Travel Distance
(miles) for n=300
0
1
2
3
4
5
Distance (Miles)
10
41
Density
Polygon of Travel Distance
(miles) for n=300
0
1
2
3
4
5
Distance (Miles)
10
42
75
50
25
0
Cum. Percent
100
Cumulative % of Travel
Distance (miles) for n=300
0
1
2
3
4
5
Distance (Miles)
10
43