Statistics for Managers AHS 360

Download Report

Transcript Statistics for Managers AHS 360

UNIT 4
A Picture Really Is Worth a Thousand Words:
Illustrating Descriptive Statistics
Chap. 4, NJS
Revised 1/22/12
AHS 360 - Unit 4
1
OBJECTIVES – Unit 4
 Why pictures are worth “a thousand words”
 How to create
 Histogram
 Polygon
 Using the Data Analysis Toolpak to create histograms
 Distinguish between a quantitative and a qualitative
distribution.
 Using the SKEW and KURT functions
 Using Excel to create and modify charts
 Different types of charts and their uses
Revised 1/22/12
AHS 360 - Unit 4
2
Presentation of Data
 Descriptive statistics – methods used to present
information in a more useful format
 Used to review streams of data
 Clinical
 Financial
 Operating activity
 Summarizes information
 What is the level of detail?
Revised 1/22/12
AHS 360 - Unit 4
3
Frequency Distributions: Patients and Length of Stay*
PATIENT ID
DAYS
PATIENT ID
DAYS
PATIENT ID
DAYS
473528
19
823117
5
412058
8
58930
14
987647
10
612149
17
214129
1
314435
7
529269
17
987973
2
546065
4
490189
7
175474
13
330233
1
3235
4
692529
5
267526
15
302927
4
710470
8
192958
5
933821
5
947133
8
673579
13
545422
17
176857
3
334366
14
341657
15
43008
20
676380
4
330600
17
946967
15
417604
12
156147
12
275066
4
150297
14
388707
2
740316
18
474010
20
372666
19
296280
10
901837
20
465126
1
758281
20
271951
16
441428
4
884235
2
823066
5
793344
3
701551
5
725294
18
824470
8
778009
4
Imagine another 900 or more of these rows/columns!!! What does it mean?
Revised 1/22/12
AHS 360 - Unit 4
4
Illustrating Data
 When describing a set of scores, you will want to use two
things…
 One score for describing the group of data

Measure of Central Tendency
 Measure of how diverse or different the scores are from
one another

Measure of Variability
 However, a visual representation of these two measures is
much more effective when examining distributions
Revised 1/22/12
AHS 360 - Unit 4
5
SAMPLE DATA SET*
Nicotine Levels in smokers
1 0 131 173 265 210 44 277 32 3
35 112 477 289 227 103 222 149 313 491
130 234 164 198 17 253 87 121 266 290
123 167 250 245 48 86 284 1 208 173
Each box represents an individual and their nicotine level.
Revised 1/22/12
AHS 360 - Unit 4
6
Frequency Distribution*
Nicotine
Frequency
0 – 99.99
11
100 – 199.99
12
200 – 299.99
14
300 – 399.99
1
400 – 499.99
2
So, 11 individuals had a nicotine level between 0 and 99,
12 had a level between 100 – 199, etc.
Revised 1/22/12
AHS 360 - Unit 4
7
CLASS FREQUENCY*
 The term class frequency refers to the number of
observations or items assigned to a given category
 Example: The number of individuals in a specific group
CLASS INTERVAL*
 The class interval indicates the range of values contained
in a given category of a frequency distribution
 Example: 1 – 5, 6 – 10, 11 – 15, etc.
Revised 1/22/12
AHS 360 - Unit 4
8
Class Interval (or Width)*
 The difference between two consecutive lower class
limits or two consecutive class boundaries
Nicotine
Class Width
Revised 1/22/12
Frequency
100
0 – 99.99
11
100
100 – 199.99
12
100
200 – 299.99
14
100
300 – 399.99
1
100
400 – 499.99
2
AHS 360 - Unit 4
9
CLASS LIMITS*
 The upper class limit identifies the maximum value that
appears in a given category while the lower class limit
corresponds to the minimum value appearing in a given
group
Revised 1/22/12
AHS 360 - Unit 4
10
Upper/Lower Class Limits*
 Upper: The largest number belonging to each class
 Lower: The smallest number belonging to each class
Nicotine
Lower
Class
Limits
Revised 1/22/12
Frequency
0 – 99.99
11
100 – 199.99
12
200 – 299.99
14
300 – 399.99
1
400 – 499.99
2
AHS 360 - Unit 4
Upper
Class
Limits
11
Class Mark (Midpoints)*
 The mid-point is the
mid-point between
the upper and lower
class interval of a
given category
Class
Midpoints
Nicotine
Frequency
0 - 50 – 99.99
11
100 - 150 – 199.99
12
200 - 250 – 299.99
14
300 - 350 – 399.99
1
400 - 450 – 499.99
2
For example: (199.99 – 100 ) /
2 + 100 (base value) = 150
Revised 1/22/12
AHS 360 - Unit 4
12
Frequency Table Example
Outpatients’ Use of Ancillary Services*
CLASS INTERVAL
CLASS MIDPOINT = 3
Units Per Patient
Number of Patients
1 to 5
6 to 10
450
UPPER LIMIT
11 to 15
16 to 20
Total
Revised 1/22/12
CLASS FREQUENCY
210
19
150
LOWER LIMIT
1,000
AHS 360 - Unit 4
13
Frequency Distributions*
 What does a frequency distribution tell us?
 Rearranges information into a more useable and
understandable form
 Groups the data
 Proportion of items in each category
 What it does not tell us?
 Minimum or maximum services per patient
 Details for individuals and groups

What was the service level or length of stay for patient #101?
 We don’t know with a frequency distribution
Revised 1/22/12
AHS 360 - Unit 4
14
COLLECTIVELY EXHAUSTIVE*
 The term collectively exhaustive simply tells us that the
distribution will accommodate all observations, ranging
from the smallest to the largest value in the set of data
 Example: Individuals grouped by race


Two Categories Chosen: African-American and White
Problem: 10 Hispanic individuals are in our sample
 Your categories were not collectively exhaustive
Revised 1/22/12
AHS 360 - Unit 4
15
MUTUALLY EXCLUSIVE*
 The term mutually exclusive simply tells us that an
observation is assigned to one and only one category
 Example: Hospitals grouped by type



Types: Rural, urban, academic, for-profit, not-for-profit, and
government
What about Cooper Green (government and urban) and Shelby
Baptist (rural and not-for-profit)? Where do they fit?
Problem: Your categories were not mutually exclusive.
Revised 1/22/12
AHS 360 - Unit 4
16
Frequency Distributions*
 How do you determine the groups / categories?
 How many are in the overall sample?
 Generally (may vary by data and/or textbook)


No fewer than 3
No more than 10 groups
 More than 10 groups can get VERY messy – note this especially in
regards to your assignments and the exams (the book says 10 – 20)
 Groups must be
 Collectively exhaustive
 Mutually exclusive
 Include all classes, even if the frequency is zero
 Try to use the same width for all classes
 Select convenient numbers for your class limits
 The sum of all class frequencies must equal the number of
original data values
Revised 1/22/12
AHS 360 - Unit 4
17
QUANTITATIVE FREQUENCY DISTRIBUTION*
 A quantitative frequency distribution is defined as one in
which the categories used to group the subjects (e.g.,
patients) are expressed numerically.
Units Per Patient
Number of Patients
1 to 5
450
6 to 10
210
11 to 15
19
16 to 20
150
Total
Revised 1/22/12
1,000
AHS 360 - Unit 4
18
Relative Frequency Distributions
Quantitative Distribution Example*
 Let’s say we have the number of ancillary services for a
set of patients during their hospitalization
 Number of services received (Medication, blood
transfusion, shot, X-ray, catheter placement, etc.)
 1,000 patients in sample

We have data for 1,000 patients and their hospitalization
Revised 1/22/12
AHS 360 - Unit 4
19
Quantitative Frequency Distribution Example*
Number of
Ancillary
Services Per
Patient
Number of
Patients
Relative
Frequency
1 to 5
450
0.45 (45%)
6 to 10
210
0.21 (21%)
11 to 15
190
0.19 (19%)
16 to 20
150
0.15 (15%)
1,000
1.00 (100%)
Total
Revised 1/22/12
AHS 360 - Unit 4
20
Relative Frequency Distributions*
 Steps
1. Select the categories that will characterize the
distribution.
2. Sort the data into the selected categories.
3. Count the items in each group.
4. Compute the relative frequencies for each group.
 Equation: fi/T
 fi = the number of observations per category
 T = total number of observations
 Example 1:
450 / 1,000 = 0.45
Revised 1/22/12
AHS 360 - Unit 4
21
QUALITATIVE FREQUENCY DISTRIBUTION*
 The groups that comprise a qualitative frequency
distribution are defined in categorical terms
 These are non-numerical categories
Category
Frequency
Relative Frequency
Digestive Disorders
250
0.25
Eye Disorders
150
0.15
Respiratory Disorders
200
0.20
Nervous System Disorders
400
0.40
1,000
1.00
Total
Revised 1/22/12
AHS 360 - Unit 4
22
Relative Frequency Distribution
Qualitative Distribution*
 1,000 patients
 Patients grouped by diagnosis type
 Can you describe the results from the frequency
distribution?
Category
Frequency
Relative Frequency
Digestive Disorders
250
0.25
Eye Disorders
150
0.15
Respiratory Disorders
200
0.20
Nervous System Disorders
400
0.40
1,000
1.00
Total
Revised 1/22/12
AHS 360 - Unit 4
23
Constructing A Relative Frequency Distribution*
1.
2.
Decide on the number of classes.
Determine the class width by dividing the range by the number of
classes (range = highest score - lowest score) and round up.
range
class width  round up of
number of classes
3.
4.
5.
6.
7.
Select for the first lower limit either the lowest score or a
convenient value slightly less than the lowest score.
Add the class width to the starting point to get the second lower
class limit, add the width to the second lower limit to get the third,
and so on.
List the lower class limits in a vertical column and enter the upper
class limits.
Represent each score by a tally mark in the appropriate class.
Total tally marks to find the total frequency for each class.
Revised 1/22/12
AHS 360 - Unit 4
24
Tabulating Numerical Data:
Frequency Distributions*
 Sort Raw Data in Ascending Order
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
 Find Range: 58 - 12 = 46
 Select Number of Classes: 5 (usually between 3 and 10)
 Compute Class Interval (width): 10 (46/5 then round up)
 Determine Class Limits: 10-19.99, 20-29.99, 30-39.99, etc.
 Compute Class Midpoints: 15, 25, 35, 45, etc.
 Count Observations & Assign to Classes
Revised 1/22/12
AHS 360 - Unit 4
25
Frequency Distributions, Relative Frequency
Distributions and Percentage Distributions*
Data in an ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Class
10 – 19.99
20 – 29.99
30 – 39.00
40 – 49.99
50 – 59.99
Total
Revised 1/22/12
Relative
Frequency Frequency Percentage
3
6
5
4
2
20
.15
.30
.25
.20
.10
1
AHS 360 - Unit 4
15%
30%
25 %
20%
10%
100%
26
Cumulative Frequency Distribution*
Data in an ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Lower
Limit
0 to 10
0 to 20
0 to 30
0 to 40
0 to 50
0 to 60
Revised 1/22/12
Cumulative
Frequency
Cumulative
% Frequency
0
3
9
14
18
20
AHS 360 - Unit 4
0
15
45
70
90
100
27
CUMULATIVE DISTRIBUTION EXAMPLE:
Inpatients, Grouped by Age, Presenting with Diseases of the
Respiratory System*
Age
Number of
Patients
Years or
Less
Cumulative
Frequency
Relative
Frequency
20-29
150
20-29
150
0.06
30-39
235
20-39
385
0.16
40-49
270
20-49
655
0.28
50-59
340
20-59
995
0.42
60-69
400
20-69
1,395
0.59
70-79
450
20-79
1,845
0.78
80-89
510
20-89
2,355
1.00
Total
Revised 1/22/12
2,355
AHS 360 - Unit 4
28
CUMULATIVE DISTRIBUTIONS*
 Example: Diagnostic Category – Respiratory Illnesses
 Variables:


Age Range – How many categories?
Number of patients?
 N (number in sample) = 2,355
 Equation: CFi/T
 CFi = the cumulative frequency associated the category i
 T = total number of observations
 Example (79 years old or less): 1,845 / 2,355 =
Revised 1/22/12
AHS 360 - Unit 4
0.78
29
Frequency Distribution Types*
 Frequency Distribution – only give you the numbers in
each group
 Relative Frequency Distribution – gives you the
proportion in each group
 Cumulative Frequency Distributions – give you the total in
a given group plus previous groups (can be reversed – low
to high or vice-versa)
Revised 1/22/12
AHS 360 - Unit 4
30
FREQUENCY DISTRIBUTIONS
Excel Example 4.1*
 Open Excel Example 4.1
 Review “Raw Numbers” tab
 We have 20 patients
 ID number for each patient is in cells A6 to A25
 The number of secondary diagnoses for each patient in
located in cells B6 to B25
Revised 1/22/12
AHS 360 - Unit 4
31
FREQUENCY DISTRIBUTIONS
USING EXCEL – EXCEL EXAMPLE 4.1*
 We want to categorize the data so that is makes more
sense
 Assumption
 All patients had at least one secondary diagnosis
 Sort observations into categories
 No patient had more than 15 secondary conditions
 Categories should be collectively exhaustive – 1 to 15
 Count the number of items assigned to each
Revised 1/22/12
AHS 360 - Unit 4
32
Excel Example 4.1
Categories: Using Bins*
 We are now going to create the categories
 Review instructions on the bottom of the “Working
Sheet” tab
 Review upper and lower limits
 Bins: Receptacles into which the observations are sorted
– cells C6 to C9
 Maximum value for each category (upper limit for each
category)
Revised 1/22/12
AHS 360 - Unit 4
33
Excel Example 4.1*
 Summarize Results - What does this information tell us?
 Be able to explain your results so that your grandmother could
understand it!
 We have 20 patients and the number of secondary diagnoses for




each one.
We’ve divided the sample into four groups based on their
secondary diagnoses (1 to 4, etc.)
There are 7 patients who has a 1 to 4 diagnoses, which represent
35% of the overall sample.
There are 4 patients (20%) in the group that has 9 to 12
diagnoses.
90% of the sample had 12 or fewer secondary diagnoses.
Revised 1/22/12
AHS 360 - Unit 4
34
Excel’s Histogram Function
 Very limited
 Not enough
information
 Not detailed
enough
 So, use the
steps/directions in
the previous and
following slides to
create your
bar/column charts
and frequency
distribution tables
Revised 1/22/12
AHS 360 - Unit 4
35
ABSOLUTE CELL REFERENCES (just an aside,
hopefully to make your Excel work a little easier)*
 When you copy formulas from one cell to another, Excel also copies any cell
references in those formulas. But Excel adjusts the cell references in the copy of the
formula so that the reference has the same relative relationship to the new
formula. See the example shown here.
 Usually these relative references are exactly what you want. But suppose you want
to copy a formula but maintain the reference to the exact same cells it originally
referred to? You can do this using absolute cell references. To create an absolute
cell reference, simply place a dollar sign before the row and/or the column. For
example, an absolute cell reference to B3 is $B$3. (You can create partially absolute
references by placing a dollar sign before only the row or only the column. Then,
when you copy the formula, the part with the dollar sign stays constant, but the
part without the dollar sign is adjusted like a relative reference.)
Revised 1/22/12
AHS 360 - Unit 4
36
HISTOGRAM
 A histogram is a graphical tool that compares the sizes of the







response groups. Heights of bars show counts for categories.
Usually depicted as a bar chart or column chart
Graphical way to represent a frequency distribution or a
distribution of relative frequencies
Series of rectangles with the area of each given by the product
of its base and height
Height = class or relative frequencies
Base = class interval (on horizontal scale)
If class intervals are equal, the area of a given rectangle
indicates the relative importance of the class in the set of
categories that defines the distribution
See Excel Examples 4.2 and 4.3
Revised 1/22/12
AHS 360 - Unit 4
37
Histograms
Revised 1/22/12
AHS 360 - Unit 4
38
Histograms
Revised 1/22/12
AHS 360 - Unit 4
39
Graphing Numerical Data: The Histogram*
Data in an ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Histogram
7
6
Frequency
6
5
5
4
4
3
3
2
2
1
0
0
5
Class Limits
Revised 1/22/12
0
15
25
35
Class Midpoints
AHS 360 - Unit 4
45
55
More
40
Frequency Polygon
 A continuous
line that
represents the
frequencies of
scores within a
class interval
Revised 1/22/12
AHS 360 - Unit 4
41
Fat and Skinny of Frequency Distributions
 Distributions can be different in four ways…
 Average value
 Variability
 Skewness
 Kurtosis
Revised 1/22/12
AHS 360 - Unit 4
42
Same Mean, Median, Mode - Different
Shapes Though*
16
16
16
Revised 1/22/12
AHS 360 - Unit 4
43
Average Value
How distributions can differ in their average score
Revised 1/22/12
AHS 360 - Unit 4
44
Measures of Central Tendency*
 Normal Curve = Mean, mode, and
median are same score
Normal Distribution
Mean. Mode & Median
15
10
>36-40
>32-36
>28-32
>24-28
>20-24
>16-20
>12-16
>8-12
0
>4-8
5
0-4
Frequency
20
Age Class Intervals (years)
Revised 1/22/12
AHS 360 - Unit 4
45
COMPARISON OF MEAN,
MEDIAN & MODE*
 If the data are unimodal and symmetrical, the three
measures of central tendency (mean, median, and mode)
will be of similar value
Peaked Distribution
20
Frequency
15
10
>36-40
>32-36
>28-32
>24-28
>20-24
>16-20
>12-16
>8-12
>4-8
0
0-4
5
Age Class Intervals (years)
Revised 1/22/12
AHS 360 - Unit 4
46
Variability
How distributions can differ in variability
Revised 1/22/12
AHS 360 - Unit 4
47
SYMMETRY & ASSEMETRY*
 A distribution is symmetrical if it is possible to construct a
perpendicular line that divides it into two identical halves.
 A distribution is asymmetrical if a tail appear at either
end.
Revised 1/22/12
AHS 360 - Unit 4
48
SHAPE OF A DISTRIBUTION
SKEWNESS & SYMMETRY*
 Describe How Data are Distributed
 Measures of Shape
 Symmetric or skewed
Negatively (Left) Skewed
Mean < Median < Mode
Revised 1/22/12
Symmetric
Mean = Median =Mode
AHS 360 - Unit 4
Positively (Right) Skewed
Mode < Median < Mean
49
Mode Median
Median Mode
Mean
Mean
 Skewed Curve =
• Mode = peak of distribution
• Median = in middle
• Mean = closest to tail of distribution
Revised 1/22/12
AHS 360 - Unit 4
50
SKEWNESS
Interpreting histograms*
 Shape………
Mean < Median < Mode
Mean > Median > Mode
Skewed to the left (negative skew)
Skewed to the right (positive skew)
15
Frequency
10
>36-40
>32-36
>28-32
>24-28
>20-24
>16-20
Age Class Intervals (years)
Age Class Intervals (years)
Revised 1/22/12
>12-16
0
>8-12
5
>4-8
>36-40
>32-36
>28-32
>24-28
>20-24
>16-20
>12-16
>8-12
0
>4-8
5
0-4
10
0-4
Frequency
15
AHS 360 - Unit 4
51
COMPARISON OF MEAN,
MEDIAN, & MODE*
 When data are skewed, the mean and median will not
be equal. The mean will be ‘pulled towards the skew’
 For example, for a positive skew, the mean will be
greater than the median
 For a negative skew, the mean will be less than the
median
Revised 1/22/12
AHS 360 - Unit 4
52
Skewness
 Positive and negative skewness
Degree of skewness in different distributions
Revised 1/22/12
AHS 360 - Unit 4
53
Kurtosis
 Platykurtic (A) - flat curve
 Leptokurtic (C) – peaked curve
Revised 1/22/12
AHS 360 - Unit 4
54
USING EXCEL TO PREPARE GRAPHIC
PRESENTATIONS
 Excel is very useful for creating charts
 However, often you will need to “clean up” your charts, play
with the settings, or even add a text box to your chart to make
it appropriate for the data and understandable for the reader
 A chart should be able to stand alone and not need an
explanation
 For our purposes, we are going to use Excel to create the
following types of charts




Column Charts (a type of Histogram)
Bar Charts (a type of Histogram)
Pie Charts
Line Charts
Revised 1/22/12
AHS 360 - Unit 4
55
Column Charts - Excel Example 4.2*
1. Select
“Insert”
3. Select “Column”
4. Select
“2-D
Clustered
Column”
2. Highlight Data (Hold
Left Mouse Key
While You Drag Over
the Appropriate
Cells)
Revised 1/22/12
AHS 360 - Unit 4
56
2. Select
“Select
Data”
Column Charts - Excel Example 4.2*
3. Select “Edit”
1. Move Mouse Over
Chart and Hit Left
Mouse Button
Revised 1/22/12
AHS 360 - Unit 4
57
Column Charts - Excel Example 4.2*
2. Select “Chart Tools”
3. Select down arrow on “Chart
Layouts”
4. Select appropriate chart layout
with legend, title, and axis options
(in this case, Layout 9)
NOTE: You can also use the layout
function to choose these items.
1. Move Mouse Over
Chart and Hit Left
Mouse Button
Revised 1/22/12
AHS 360 - Unit 4
58
Column Charts Excel Example 4.2*
1. Edit Axis Titles and Chart Title
as Appropriate.
NOTE: I deleted the “Series 1” as it
did not add any value to the chart.
FINALLY: Make your charts very
readable – they should be able to
stand alone without any need for
explanation.
Revised 1/22/12
AHS 360 - Unit 4
59
Column Charts - Excel Example 4.2*
 Describe your chart
 What are the categories?
 How many are in each group?
 What is the implication for the manager or the reader?
 Are the groupings appropriate?
 Does the chart adequately summarize your data?
Revised 1/22/12
AHS 360 - Unit 4
60
Column Charts - Excel Example 4.3*
 Open Excel Example 4.3
 Note that we now have the cumulative frequency
distribution as well
 The exercise here involves selecting the groups by the
cumulative frequency distribution and creating a column
chart
Revised 1/22/12
AHS 360 - Unit 4
61
Bar Charts - Excel Example 4.3*
 Select cells C5:C10 while holding your left mouse button
 Release mouse key and hit and hold the control button
 With the control button down, also hold the left mouse
button and select cells A4:A10
 Release all buttons and the appropriate cells should be
highlighted
Revised 1/22/12
AHS 360 - Unit 4
62
Bar Charts - Excel Example 4.3*
 Select “Insert” → “Bar” → 2-D Clustered Column
 Select (highlight) the chart → select “Chart Tools” (at the
very top) → select “2-D Clustered Bar”
Revised 1/22/12
AHS 360 - Unit 4
63
Bar Charts - Excel Example 4.3*
1. Note how the
axis titles on the
left is not correct.
Revised 1/22/12
2. With the chart selected, hit
select data, then edit axis labels.
AHS 360 - Unit 4
64
Bar Charts - Excel Example 4.3*
1. Select the
appropriate axis
labels (A7:A10)
and hit okay.
2. Now your axis labels for
your groups are correct.
Revised 1/22/12
AHS 360 - Unit 4
65
Bar Charts (Histograms)
Excel Example 4.3*
 Go to your chart layouts and select the appropriate chart type
 In this case, I selected Layout 9
 Add your chart title and axis titles, labeled appropriately
 Delete series labels on the right – they don’t add anything for
this chart.
 Add data labels to each bar (select bars and then right click,
then “Add Data Labels”
 Wholla! You have a great looking bar chart histogram!
Revised 1/22/12
AHS 360 - Unit 4
66
LINE CHARTS
Revised 1/22/12
AHS 360 - Unit 4
67
LINE CHARTS - Excel Example 4.4*
 Condenses information about two or more variables and
portray the interrelation among the variables of interest
 Show changes in the value of the variables over a period
(of time, often)
 Often useful when you are comparing TWO variables
 See Excel Example 4.4
Revised 1/22/12
AHS 360 - Unit 4
68
LINE CHARTS - Excel Example 4.4*
2. Select “Insert”
3. Select “Line”
4. Select 2-D Line (not stacked)
1. Select (Highlight) the
Data (B3:C14)
Revised 1/22/12
AHS 360 - Unit 4
69
LINE CHARTS - Excel Example 4.4*
 Select the chart and go to an appropriate line chart layout
(in this case, I chose Layout #8, which had a chart title
and axis titles)
 Change the axis and chart titles to appropriately match
the data
 Add data labels if you’d like
 Resize the chart as appropriate
Revised 1/22/12
AHS 360 - Unit 4
70
Pie Charts
 The Pie Chart is a
graphical tool that
helps us see what
part of the whole
each group forms
 Typically you don’t
want any more
than 10 “slices” (3 –
5 is more typical)
Revised 1/22/12
AHS 360 - Unit 4
71
PIE CHARTS - Excel Example 4.5*
 The area of the chart represents the total of the
phenomenon of interest
 Each slice represents the relative importance of each
category or class
 See Excel Example 2.5
Revised 1/22/12
AHS 360 - Unit 4
72
PIE CHARTS - Excel Example 4.5*
 Select the appropriate data (A4:B10)
 Select “Insert”
 Select “Pie Chart – 2-D Pie”
 Select chart layout as appropriate
 Labels the chart appropriately (title)
 Choose percent data labels if appropriate
Revised 1/22/12
AHS 360 - Unit 4
73
Excel Example 4.6 – “Pulling it all
together”*
 Now, let’s try another example using all of the previous tools





plus two additional tools
Review Example 4.6
Review “Raw Data” tab
What do we have?
 How many subjects do we have?
 What are the variables of interest?
Can you describe the data set as is?
So, we must use Excel to help us summarize, describe, and
graphically represent our data
Revised 1/22/12
AHS 360 - Unit 4
74
Excel Example 4.6*
 Open the data set in Excel (Example 4.6)
 We have 5,099 cases (subjects/patients)
 We have the weight (in pounds) and gender (1 = male, 2 =
female) for each subject
Revised 1/22/12
AHS 360 - Unit 4
75
Excel Example 4.6*
 Using “Sort”
 Using the “Descriptive Statistics” tool to help you
categorize your data
Revised 1/22/12
AHS 360 - Unit 4
76
Excel Example 4.6 – Sorting*
 Sometimes it may be helpful at first to sort your data to
see how your data looks when it is in order
 In this case, let’s sort our data by weight so we can see
values from low to high
Revised 1/22/12
AHS 360 - Unit 4
77
Excel Example 4.6 – Sorting*
1. Highlight your variables
(just select columns A, B,
and C)
Revised 1/22/12
2. Select
“Data”
3. Select the variable you wish to
sort by – in this case, weight –
and then select “okay”
AHS 360 - Unit 4
78
Excel Example 4.6 – Sorting*
 We can see that our subjects range in weight from 76 to 350
pounds
 But uh-oh, we have a problem: 16 of our subjects do not have
a weight
 We have several options, but the easy one (but not necessarily
sound statistics though) is to just delete those cases
 Why may this not be appropriate? When you delete some
cases, you are deleting part of your sample and this could
influence your overall results
 If you do this, just make sure you note it in your write-up
 So, let’s take the easy route and delete those cases
Revised 1/22/12
AHS 360 - Unit 4
79
Excel Example 4.6 – Descriptive Statistics*
 “Descriptive Statistics” tool
 Make sure you have added your Data Analysis toolpak
Revised 1/22/12
AHS 360 - Unit 4
80
Excel Example 4.6 – Descriptive Statistics*
1. Select
“Data”
2. Select “Data Analysis”
3. Select “Descriptive
Statistics”
Revised 1/22/12
AHS 360 - Unit 4
81
Excel Example 4.6 – Descriptive Statistics*
3. Select the “Output Range” – this can be anywhere, but generally a blank space on
your sheet – note that you need to hit the white box first or Excel will try to bump you to
the input range ; I used an output range of E20
1. Select “Input Range”
– B2:B5100 – the
weights of all the
subjects
2. Check the “Summary
statistics” box
Revised 1/22/12
AHS 360 - Unit 4
82
Excel Example 4.6 – Descriptive Statistics*
1. Note the Descriptive
Statistic table that pops
up
2. We can now see the
“Mean,” or the average
weight of the subject –
more on this later – the
average weight of the
subjects is ~ 157 pounds
3. We can also see the
minimum and maximum –
76 and 350
Revised 1/22/12
AHS 360 - Unit 4
83
Excel Example 4.6 – Frequency Distribution*
 Using the “Descriptive Statistics” table will make it easier to
determine the appropriate groups for our data
 With a minimum of 76 and a maximum of 350, let’s make our
groups like this
 50 – 99, 100 – 149, 150 – 199, 200 – 249, 250 – 299, and
300 – 350

Note that the last category is one higher than the others (350)
 That gives us six groups, which is reasonable, and it is
collectively exhaustive (includes everyone) and mutually
exclusive (no two subjects are going to fall into more than
one category)
Revised 1/22/12
AHS 360 - Unit 4
84
Excel Example 4.6 – Frequency Distribution*
 So, in the Excel Example 4.6, let’s create our bins and out
labels for each column – groups, relative frequencies, and
cumulative frequencies
 Now, let’s use Excel to create our frequency table
 Select “Data” → “Data Analysis” → “Histogram”
Revised 1/22/12
AHS 360 - Unit 4
85
Excel Example 4.6 – Frequency Distribution*
1. Select the input
range:
B2:B5100
2. Select the Bin
range: E4:E9
3. Select the output
range (anywhere on a
blank spot on the
page) – I chose
4. Check the
cumulative
percentage box
5. Click “OK”
Revised 1/22/12
AHS 360 - Unit 4
86
Excel Example 4.6 – Frequency Distribution*
 Now, paste your Histogram output into the appropriate places
in the table
 Calculate your relative frequencies for each group (see the
instructions in Excel Example 4.1)
 Create a column chart by frequency, a bar chart by relative
frequency, and a pie chart (any way you want)
 a line chart for this data would not be appropriate
 Label charts appropriately
 Vertical axis, horizontal axis, title, get rid of column label, add data
labels, etc.
 Now, describe your results in words your grandmother would
understand
Revised 1/22/12
AHS 360 - Unit 4
87
Ten Ways to a Great Figure
 Minimize the “junk”
 Plan before you start creating
 Say what you mean…mean what you say
 Label everything
 Communicate one idea
 Keep things balanced
 Maintain the scale in the graph
 Remember…simple is best
 Limit the number of words
 The chart alone should convey what you want to say
Revised 1/22/12
AHS 360 - Unit 4
88
Principles of Graphical Excellence*
 Well-Designed Presentation of Data that Provides:
 Substance
 Statistics
 Design
 Communicates Complex Ideas with Clarity, Precision and
Efficiency
 Gives the Largest Number of Ideas in the Most Efficient
Manner
 Almost Always Involves Several Dimensions
 Telling the Truth about the Data
Revised 1/22/12
AHS 360 - Unit 4
89
Errors in Presenting Data*
 Using ‘Chart Junk’
 No Relative Basis in Comparing Data between Groups
 Compressing the Vertical Axis
 No Zero Point on the Vertical Axis
Revised 1/22/12
AHS 360 - Unit 4
90
‘Chart Junk’*
Bad Presentation

Good Presentation
Minimum Wage
1960: $1.00
Minimum Wage
4
$
1970: $1.60
2
1980: $3.10
0
1990: $3.80
1960
1970
1980
1990
In the depiction on the left, you can’t really tell the detailed significance between the
groups. The sizes can be deceptive.
Revised 1/22/12
AHS 360 - Unit 4
91
No Relative Basis*
Bad Presentation

A’s received by
students.
Freq.
300
200
30 %

10
0

FR
SO
JR
SR
Good Presentation
A’s received by
students.

FR
SO
JR
SR
FR = Freshmen, SO = Sophomore, JR = Junior, SR = Senior
In this case, there are many more freshmen than juniors or seniors so the larger
number of A’s for freshmen deceptive. When put into relative frequencies, the
percentages are the same.
Revised 1/22/12
AHS 360 - Unit 4
92
Compressing Vertical Axis*
Bad Presentation
Good Presentation
Quarterly Sales
200
$
50
100
25
0
0
Q1
Q2
Q3 Q4
Quarterly Sales
$
Q1
Q2
Q3
Q4
When the vertical axis is compressed, it is more difficult to see the differences
between the groups.
Revised 1/22/12
AHS 360 - Unit 4
93
No Zero Point on Vertical Axis*
Bad Presentation

Good Presentation
Monthly Sales
45
Monthly Sales
$
42
39
45
42
39
$
36
36
J
F
0
M A M J
J
F
M A M J
Graphing the first six months of sales.
Revised 1/22/12
AHS 360 - Unit 4
94
ARTICLE REVIEW
 DesRoches “Electronic Health Record”
 DesRoches, C., Campbell, E., Vogeli, C., Zheng, J., Rao, S., Shi
elds, A., Donelan, K., Rosenbaum, S., Bristol, S., & Jha, A. (20
10). Electronic Health Records' Limited Successes Suggest
More Targeted Uses. Health Affairs, 29(4), 639-46.
Revised 1/22/12
AHS 360 - Unit 4
95
UNIT SUMMARY
 Frequencies
 Distributions
 Basic terms
 Using Excel
 Creating graphs
* NOTE: This slide contains information that is not in the chapter.
Revised 1/22/12
AHS 360 - Unit 4
96
Homework: Unit 4 Assignment
 Question 1: combine what you’ve learned from Excel Examples 4.1 and 4. 6
(see the Excel files in the Unit 4 Material)
 BIN values are 3, 7, and 11 (the upper limits – these will be placed
down the left column; this looks a bit different in the back of the book
in the answer section)
 Use the Excel formulas and instructions from the slides and the Excel
examples
 DO NOT simply insert numbers or use a calculator; use the formulas
we’ve learned in class
 For Question 2, label each axis thoroughly and label each piece of
the pie in detail. Someone should be able to look at each chart and
be able to immediately tell you the full story about it.
 Questions 3 & 4: written answers only
Revised 1/22/12
AHS 360 - Unit 4
97
Practice, Practice, Practice
 Don’t go crazy yet!
 Review and practice all of the exercises we’ve gone over
in class.
 Call or e-mail me with any questions.
Revised 1/22/12
AHS 360 - Unit 4
98
NEXT WEEK
 Unit 5: Computing Correlation Coefficients &
Hypotheticals
 Review Unit 4, Chap. 4
 Deliverables Due Before the Next Class:
 Unit 4 Quiz

Reidpath, D. D., & Allotey, P. (2003). Infant mortality rate as an
indicator of population health. Journal of Epidemiology and
Community Health, 57(5), 344-346.
 Unit 4 Assignment
* NOTE: This slide contains information that is not in the chapter.
Revised 1/22/12
AHS 360 - Unit 4
99