QBM117 Business Statistics Descriptive Statistics

Download Report

Transcript QBM117 Business Statistics Descriptive Statistics

QBM117
Business Statistics
Descriptive Statistics
Objectives
•
To distinguish between a variable and data
•
To distinguish between quantitative and qualitative
data
•
To discuss the different levels of measurement
•
To summarise quantitative data using frequency
distributions and histograms
•
To learn how to produce a histogram in Excel
Introduction
•
Managers, economists and business analysts
frequently have access to large masses of potentially
useful data.
•
Before the data can be used to support a decision
(inferential statistics), they must be organised and
summarised (descriptive statistics).
Descriptive Statistics
•
Descriptive Statistics involves collecting, organising,
summarising and presenting numerical data.
•
Once the data is collected and organised, it needs to
be summarised and presented in such a way that the
important features of the data are highlighted.
•
Descriptive statistics methods can be applied to data
from an entire population and data from a sample.
Variables and Data
•
A variable is any characteristic of a population or
sample that is of interest to us.
•
The term data refers to the actual values of variables.
Example 1
Information concerning a magazine’s readership is of
interest to both the publisher and to the magazine’s
advertisers. A survey of 100 subscribers included the
following questions:
What is your age?
What is your sex?
What is your marital status?
What is your annual income?
What are the variables?
The variables are age, sex, marital status and annual
income.
What are the data?
The data are the actual values of the variables;
for the age variable, the data are the actual ages of
the 100 subscribers sampled, e.g. 34 years.
for the sex variable, the data are the sexes of the 100
subscribers sampled, e.g. Male or Female.
Types of Data
•
Data may be either quantitative (numerical) or
qualitative (categorical).
•
Quantitative data are numerical observations.
•
Qualitative data are categorical observations.
Example 1 revisited
Information concerning a magazine’s readership is of
interest to both the publisher and to the magazine’s
advertisers. A survey of 100 subscribers included the
following questions:
What is your age?
What is your sex?
What is your marital status?
What is your annual income?
For each of the questions determine the data type of
the possible responses.
What is your age?
quantitative
What is your sex?
qualitative
What is your marital status?
qualitative
What is your annual income?
quantitative
Levels of Measurement
•
Data can also be described in terms of the level of
measurement attained.
•
All data are generated by one of four scales of
measurement:
- nominal
- ordinal
- interval
- ratio
Levels of Measurement of
Qualitative Data
•
Qualitative data are considered to be measured on a
nominal scale or an ordinal scale.
•
A nominal scale classifies data into distinct categories
in which no ordering is implied.
•
An ordinal scale classifies data into distinct
categories in which ordering is implied.
Example 2
For each of the following examples of qualitative
data, determine the level of measurement.
1. Type of stocks owned (Growth, Income,
Technology, Other, None)
Nominal
2. Product satisfaction (Very unsatisfied,
Unsatisfied, Neutral, Satisfied, Very satisfied)
Ordinal
3. Student Grades (HD, DI, CR, PS, FL)
Ordinal
4. Personal Notebook (Compaq, Toshiba, IBM,
Apple, ACER, Other)
Nominal
5. Commodities (Gold, Oil, Aluminium, Cooper,
Zinc, Wheat, Wool, Cotton, Sugar)
Nominal
6. Faculty rank (Professor, Associate Professor,
Senior Lecturer, Lecturer, Associate Lecturer)
Ordinal
Levels of Measurement of
Quantitative Data
•
Quantitative data are considered to be measured on
an interval scale or a ratio scale.
•
An interval scale is an ordered scale in which the
difference between measurements is a meaningful
quantity that does not involve a true zero point.
•
A ratio scale is an ordered scale in which the
difference between points involves a true zero point.
Example 3
For each of the following examples of quantitative
data, determine the level of measurement.
1. Temperature (degrees Celsius or Fahrenheit)
Interval
2. Height (centimeters or inches)
Ratio
3. Calendar Years
Interval
4. Annual income
Ratio
Example 4
For each of the following examples of data,
determine the data type and the level of
measurement.
1. Name of Internet provider
qualitative, nominal
2. Monthly charge for Internet service
quantitative, ratio
3. Amount of time spent on the Internet per week
quantitative, ratio
4. Primary purpose for using the Internet
qualitative, nominal
5. Number of emails received per week
quantitative, ratio
6. Number of on-line purchases made in a month
quantitative, ratio
7.
Total amount spent on on-line purchases in a
month
quantitative, ratio
8.
Whether the personal computer as a rewritable
CD drive
qualitative, nominal
Graphical and Tabular Methods for
Quantitative Data
•
The best way to examine large amounts of data is to
present it in summary form by constructing
appropriate tables and graphs.
•
We can then extract the important features from the
data from these tables and graphs.
•
Often, the first step taken towards summarising a
mass of numbers is to form what is known as a
frequency distribution.
Frequency Distribution
•
A frequency distribution is a tabular summary of a set
of data showing the number (frequency) of
observations in each of several non-overlapping
classes.
•
When constructing a frequency distribution you need
to
- select an appropriate number of classes
- select an appropriate width for each class
- make sure that classes are non-overlapping
and contain all observation
The following table is a guide to the appropriate
number of classes for different numbers of
observations.
Number of observations
Less than 50
50-200
200-500
500-1000
1000-5000
5000-50000
More than 50000
Number of classes
5-7
7-9
9-10
10-11
11-13
13-17
17-20
•
An alternative rough guide to selecting the
appropriate number of classes K required to
accommodate n observations is given by Sturge’s
formula:
K=1+3.3log10n
•
Once the number of classes to be used has been
chosen, the approximate class width is calculated
using the following formula:
Class width = largest value – smallest value
number of classes
•
The class width chosen should allow for convenient
and easy reading.
•
You need to ensure that the classes do not overlap
and that each observation is contained in a class.
•
The classes should then be listed in a column.
•
You then need to count the number of observations
that fall into each class interval.
•
The counts (frequencies) are then listed next to their
respective classes.
Example 5
Exercise 2.41 page 50 of text
The number of items returned to a leading Brisbane
retailer by its customers recorded for the last 25 days
are as follows:
21
18
8
19
17
14
22
17
19
11
6
12
21
16
25
16
19
10
9
29
24
6
21
20
25
Construct a frequency distribution for these data.
There are n=25 observations.
The table suggests that 5-7 classes would be
appropriate.
A rough guide to an appropriate number of classes is
K=1+3.3log1025
=5.61 (2 d.p.)
Approximate class width = 29-6 = 3.83
6
Round this up to 5 as a class width of 5 is easy and
convenient.
Now we need to choose non-overlapping intervals of
width 5 so that each observation falls into one
interval.
21 8 17 22
18 19 14 17
6 21 25 19
12 16 16 10
24
19
11
9
29
6 21 20 25
Number of items
>5 up to and including 10
>10 up to and including 15
>15 up to and including 20
>20 up to and including 25
>25 up to and including 30
Tally Frequency
IIII
5
III
3
IIII IIII
9
IIII II
7
I
1
Histograms
•
The information in a frequency distribution is often
grasped more easily if the distribution is graphed.
•
The most common graphical technique used for
representing a frequency distribution for quantitative
data is the frequency histogram.
Frequency Histograms

A frequency histogram is constructed by placing the
variable of interest on the horizontal axis, and the
frequency on the vertical axis.

The frequency of each class is shown by drawing a
rectangle whose base is the class interval on the
horizontal axis and whose height is the
corresponding frequency.
Example 5 revisited
The number of items returned to a leading Brisbane
retailer by its customers recorded for the last 25 days
are as follows:
21
18
8
19
17
14
22
17
19
11
6
12
21
16
25
16
19
10
9
29
24
6
21
20
25
Construct a frequency histogram for these data.
Histogram of the Number of Items Returned By Customers
Frequency
8
6
4
2
0
0
5
10
15
20
Number of Items Returned by Customers
25
30
Relative Frequency Histograms
•
Instead of showing the absolute frequency of
observations in each class, it is often preferable to
show the proportion of observations falling into each
class.
•
To do this we replace the class frequency by the
relative class frequency, which is calculated as
follows:
class relative frequency =
class frequency______
Total number of observations
•
We start be forming a relative frequency distribution.
•
The frequencies in the frequency distribution are
replaced by the relative frequencies.
•
We then construct a relative frequency histogram.
•
The relative frequency histogram is constructed by
placing the relative frequency on the vertical axis (in
place of the frequency).
Example 5 revisited
The number of items returned to a leading Brisbane
retailer by its customers recorded for the last 25 days
are as follows:
21
18
8
19
17
14
22
17
19
11
6
12
21
16
25
16
19
10
9
29
24
6
21
20
25
Construct a relative frequency distribution for these
data.
Number of items
Frequency Relative
Frequency
>5 up to and including 10
>10 up to and including 15
>15 up to and including 20
5
3
9
0.20
0.12
0.36
>20 up to and including 25
>25 up to and including 30
7
1
0.28
0.04
Construct a relative frequency histogram for these
data.
Relative Frequency Histogram of the Number of Items Returned By Customers
Relative Frequency
0.32
0.24
0.16
0.08
0
5
10
15
20
Number of Items Returned by Customers
25
30
Shapes of Histograms
•
The purpose of drawing histograms is to acquire
information.
•
We describe the shape of a histogram on the basis of
the following four characteristics.
- symmetry
- skewness
- number of modes
- bell-shaped
Symmetry
•
A histogram is said to be symmetric if, when we draw
a vertical line down the centre of the histogram, the
two sides are identical in shape and size.
Skewness
•
A histogram with a long tail extending to the right is
positively skewed.
•
A histogram with a long tail extending to the left is
negatively skewed.
Number of Modes
•
A unimodal histogram is one with a single peak.
•
A bimodal histogram is one with two peaks
•
A multimodal histogram is one with several peaks.
Bell-shaped
•
A special type of symmetric unimodal histogram is
one that is bell-shaped.
•
You will discover the importance of this in the next
topic.
Cumulative Frequency Distribution
•
A variation of the frequency distribution that provides
another tabular summary of quantitative data is the
cumulative frequency distribution.
•
The cumulative frequency distribution contains the
same number of classes as the frequency
distribution.
•
However, the cumulative frequency distributions
shows the number of observations less than or equal
to the upper class limit of each class.
Cumulative Relative Frequency
Distribution
•
The cumulative relative frequency distribution shows
the proportion of observations with values less than
or equal to the upper limit of each class.
•
The cumulative relative frequency distribution can be
computed either by summing the relative frequencies
in the relative frequency distribution, or by dividing
the cumulative frequencies by the total number of
observations.
Ogives
•
A graph of the cumulative relative frequency is called
an ogive.
•
The cumulative relative frequency of each class is
plotted above the upper limit of the corresponding
class, and the points representing the cumulative
relative frequencies are the joined by straight lines.
•
The ogive is closed at the lower end by extending a
straight line to the lower limit of the first class.
Example 5 revisited
The number of items returned to a leading Brisbane
retailer by its customers recorded for the last 25 days
are as follows:
21
18
8
19
17
14
22
17
19
11
6
12
21
16
25
16
19
10
9
29
24
6
21
20
25
Construct a cumulative relative frequency distribution
for these data.
Number of items
> 5 up to and including 10
>10 up to and including 15
>15 up to and including 20
>20 up to and including 25
>25 up to and including 30
Relative Cumulative
Frequency Relative
Frequency
0.20
0.20
0.12
0.32
0.36
0.68
0.28
0.96
0.04
1.00
Construct an ogive for these data.
Cumulative Relative Frequency
Orgive of the Number of Items Returned by
Customers
1
0.8
0.6
0.4
0.2
0
5
10
15
20
25
Number of Items Returned by Customers
30
Histograms for Large Data Sets

We have constructed a frequency distribution and
histogram for a small data set by hand.

We are now going to construct a frequency
distribution and histogram for a large data set.

To do this by hand would be very time consuming.
Excel

There are many computer software packages
available which make dealing with large data sets
quite manageable.

We will use Excel rather than a statistical package as
most students are familiar with Excel.

However, some of the things Excel does are not
“statistically” correct.
Defining Class Intervals

Note that the method we use to define class intervals
for frequency distributions is slightly different to the
method described in the text.

On page 20 of the text (page 19 of the abridged
version) the class intervals for the frequency
distribution for Example 2.1 are
0 up to but not including 15
15 up to but not including 30
and so on

Using our method the class intervals would be
>0 up to and including 15
>15 up to and including 30
and so on

We use this method as it is consistent with the
method of defining intervals used by Excel.

This way manually prepared frequency distributions
will be the same as frequency distributions prepared
using Excel.
Histograms in Excel

There are instructions on how to produce a histogram
in Excel on page 23 of the text (page 21 of the
abridged version).

We will modify some of these instructions.

Detailed instructions will be given in Tutorial 1.

The histogram produced by Excel needs some
editing.

Excel produces histograms with gaps between the
columns.

We need to remove these gaps.

We need to change the horizontal axis label.

We need to remove the legend.

And we need to add an appropriate title to the plot.

Excel allows you to specify the upper limits of the
intervals.

However when it creates the histogram, it puts the
upper limit in the center of the interval.

The upper limit should be at the extreme right of the
interval.

We will use the Chart Wizard to edit the histogram
produced by Excel.

As Excel places the upper limit in the middle of the
column, we will determine the midpoint of each class
and use the Chart Wizard to plot these values instead
of the upper limits.
Example 2.1 from text

We are going to produce a histogram of the salary
data from Exercise 2.5 from the text.

The data are stored in the file XR02-46.
Histogram from Excel
200
100
0
upper limit
110
90
70
50
Frequency
30
Frequency
Histogram
Edited Histogram
Histogram of Annual Salaries of
Univeristy Academics
Frequency
120
100
80
60
40
20
0
25
35
45
55
65
75
Salary ($000's)
85
95 105
Reading for next lecture
•
Chapter 2 Section 2.5
•
Chapter 3 Sections 3.1-3.2
Exercises
•
2.3
•
2.9 omit part a and revise parts b and c to read
“…>20 as the lower limit…”