Transcript Document
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
WFM 5201: Data Management and
Statistical Analysis
Lecture-1: Descriptive Statistics
[Measures of central tendency]
Akm Saiful Islam
Institute of Water and Flood Management (IWFM)
Bangladesh University of Engineering and Technology (BUET)
April, 2008
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Descriptive Statistics
Measures of Central Tendency
Measures of Location
Measures of Dispersion
Measures of Symmetry
Measures of Peakdness
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Measures of Central Tendency
The central tendency is measured by
averages. These describe the point about
which the various observed values cluster.
In mathematics, an average, or central
tendency of a data set refers to a
measure of the "middle" or "expected"
value of the data set.
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Measures of Central Tendency
Arithmetic Mean
Geometric Mean
Weighted Mean
Harmonic Mean
Median
Mode
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Arithmetic Mean
The arithmetic mean is the sum of a set of
observations, positive, negative or zero,
divided by the number of observations. If
we have “n” real numbers x1 , x2 , x3 , ......., xn ,
their arithmetic mean, denoted by x , can
be expressed as:
n
x1 x 2 x3 ............. x n
x
n
x
x
i 1
n
i
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Arithmetic Mean of Group Data
if z1 , z 2 , z3 ,......... ., z k are the mid-values and
f1 , f 2 , f 3 ,........,f k are the corresponding
frequencies, where the subscript ‘k’ stands
for the number of classes, then the mean
is
f i zi
z
fi
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Geometric Mean
Geometric mean is defined as the positive root of the
product of observations. Symbolically,
G ( x1 x2 x3 xn )
1/ n
It is also often used for a set of numbers whose values
are meant to be multiplied together or are exponential in
nature, such as data on the growth of the human
population or interest rates of a financial investment.
Find geometric mean of rate of growth: 34, 27, 45, 55,
22, 34
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Geometric mean of Group data
If the “n” non-zero and positive variatevalues x1 , x2 ,........,xn occur f , f ,.......,f times,
respectively, then the geometric mean of
the set of observations is defined by:
1
G x1
f1
f2
x 2 x n
fn
n
Where N f i
i 1
1
N
2
n
n fi
xi
i 1
1
N
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Geometric Mean (Revised Eqn.)
Ungroup Data
Group Data
G ( x1 x2 x3 xn )
G ( x1 f1 x2 f 2 x3 f 3 xn )
1
G AntiLog
N
1
G AntiLog
N
Log xi
i 1
n
n
i 1
f i Log xi
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Harmonic Mean
Harmonic mean (formerly sometimes called
the subcontrary mean) is one of several
kinds of average.
Typically, it is appropriate for situations when
the average of rates is desired. The harmonic
mean is the number of variables divided by the
sum of the reciprocals of the variables. Useful
for ratios such as speed (=distance/time) etc.
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Harmonic Mean Group Data
The harmonic mean H of the positive real
numbers x1,x2, ..., xn is defined to be
Ungroup Data
H
n
n
i 1
Group Data
H
1
xi
n
n
i 1
fi
xi
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Exercise-1: Find the Arithmetic ,
Geometric and Harmonic Mean
Class
Frequency
(f)
x
fx
f Log x
f/x
20-29
3
24.5
73.5
4.17
8.17
30-39
5
34.5
172.5
7.69
6.9
40-49
20
44.5
890
32.97
2.23
50-59
10
54.5
545
17.37
5.45
60-69
5
64.5
322.5
9.05
12.9
Sum
N=43
2003.5
71.24
35.64
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Weighted Mean
The Weighted mean of the positive real numbers
x1,x2, ..., xn with their weight w1,w2, ..., wn is
defined to be
n
w x
i i
x
i 1
n
w
i
i 1
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Median
The implication of this definition is that a
median is the middle value of the
observations such that the number of
observations above it is equal to the
number of observations below it.
If “n” is odd
Me X 1
2
( n 1)
If “n” is Even
1
M e X n X n
1
2 2
2
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Median of Group Data
h
M e Lo
fo
n
F
2
L0 = Lower class boundary of the median
class
h = Width of the median class
f0 = Frequency of the median class
F = Cumulative frequency of the premedian class
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Steps to find Median of group data
1.
Compute the less than type cumulative frequencies.
2.
Determine N/2 , one-half of the total number of cases.
3.
Locate the median class for which the cumulative
frequency is more than N/2 .
4.
Determine the lower limit of the median class. This is L0.
5.
Sum the frequencies of all classes prior to the median
class. This is F.
6.
Determine the frequency of the median class. This is f0.
7.
Determine the class width of the median class. This is h.
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Example-3:Find Median
Age in years
Number of births
Cumulative number of
births
14.5-19.5
677
677
19.5-24.5
1908
2585
24.5-29.5
1737
4332
29.5-34.5
1040
5362
34.5-39.5
294
5656
39.5-44.5
91
5747
44.5-49.5
16
5763
All ages
5763
-
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Mode
Mode is the value of a distribution for which the
frequency is maximum. In other words, mode is
the value of a variable, which occurs with the
highest frequency.
So the mode of the list (1, 2, 2, 3, 3, 3, 4) is 3.
The mode is not necessarily well defined. The
list (1, 2, 2, 3, 3, 5) has the two modes 2 and 3.
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Example-2: Find Mean, Median
and Mode of Ungroup Data
The weekly pocket money for 9 first year pupils
was found to be:
3 , 12 , 4 , 6 , 1 , 4 , 2 , 5 , 8
Mean
5
Median
4
Mode
4
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Mode of Group Data
1
M 0 L1
h
1 2
L1 = Lower boundary of modal class
Δ1 = difference of frequency between
modal class and class before it
Δ2 = difference of frequency between
modal class and class after
H = class interval
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Steps of Finding Mode
Find the modal class which has highest
frequency
L0 = Lower class boundary of modal class
h = Interval of modal class
Δ1 = difference of frequency of modal
class and class before modal class
Δ2 = difference of frequency of modal class and
class after modal class
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Example -4: Find Mode
Slope Angle
(°)
Midpoint (x)
Frequency (f)
Midpoint x
frequency (fx)
0-4
2
6
12
5-9
7
12
84
10-14
12
7
84
15-19
17
5
85
20-24
22
0
0
n = 30
∑(fx) = 265
Total