Transcript Document

STATISTIKA
CHATPER 2
(Summarizing and Graphing Data
Summarizing and Graphing Data)
2-2 Frequency Distributions
2-3 Histograms
2-4 Statistical Graphics
SULIDAR FITRI, M.Sc
FEBRUARY 20,2013
Introduction To Statistics
Math 13
Essentials of
Statistics
3rd edition
by Mario F. Triola
Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.
Overview
Important Characteristics of Data
1. Center: An average value that
indicates where the middle of the
data set is located.
2. Variation: A measure of spread - the
amount by which the values vary
among themselves.
3. Distribution: The nature (shape) of
the distribution of data: bellshaped, uniform, or skewed.
4. Outliers: Data values that lie very
far away from the vast majority of
other values.
5. Time: Changing characteristics of
the data values over time.
90
80
70
60
50
40
30
East
West
North
20
10
0
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
Section 2-2
Frequency Distributions
Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.
Definition
Frequency Distribution Table

lists data values (individually or by
groups) in one column, and their
corresponding frequencies (counts) in
the second column.


Frequencies are denoted: F (for a population) or f (for a sample).
The total sum of the frequencies in all classes must add up to the
population size (or the sample size):
F  N
f n
Frequency
Distribution: Ages of
Best Actresses
Original Data
Frequency Distribution
Frequency Distributions
Definitions
Lower Class Limit
is the smallest number that can actually belong to a class.
Lower Class
Limits
Upper Class Limit
is the largest number that can actually belong to a class.
Upper Class
Limits
Class Width
is the difference between two consecutive lower class limits (or two
consecutive upper class limits).
Editor:
Substitute Table
2-2
Class
Width
10
10
10
10
10
10
Class Boundaries
are the numbers used to separate classes:
class boundary falls in the middle
of the gap created by class limits.
20.5
30.5
Class
Boundaries
40.5
50.5
60.5
70.5
80.5
Editor:
Substitute Table
2-2
Class Midpoints
are the numbers that fall in the middle of each class:
class midpoint can be found as
(lower class limit + upper class limit)
2
25.5
Class
Midpoints
35.5
45.5
55.5
65.5
75.5
Reasons for Constructing
Frequency Distributions
1. Large data sets can be summarized.
2. We can gain some insight into the
nature of data.
3. We have a basis for constructing
important graphs.
Constructing A Frequency Distribution
1. Decide on the number of classes (best: between 5 and 20).
2. Calculate the class width (round up):
class width  (maximum data value) – (minimum data value)
number of classes
3. Choose the starting point, which will be the lower limit of the
first class.
4. List the lower class limits by adding the calculated class width
to the lower limit of the first class.
5. List the upper class limits, which should be one less then the
next lower class limit.
6. Enter a count (frequency) of data values in each class.
Grouping with Equal Class widths Ex. 1
Given the ages of the male Oscar award recipients, construct a frequency
distribution table with 6 classes:
(1) Smallest data value = 29
Largest data value = 76
76  29
8
(2) Class width =
6
(3) Let’s choose the starting point to
be 29
(4) then the lower class limits are:
Age of Actor
(5) then the upper class limits are:
36 ,44 ,52 ,60 ,68,76
(6) Now, tally the values in each
class and fill in the 2nd column:
29
29 8  37
37 8  45
45 8  53
53 8  61
61 8  69
69 8  77
29 – 36
37 – 44
45 – 52
53 – 60
61 – 68
69 – 76
Solution:
Frequency Distribution: Ages of Best Actors
Age of Actor
Frequency, f
29 – 36
15
37 – 44
32
45 – 52
17
53 – 60
8
61 – 68
3
69 – 76
1
n = 76 (total)
Definition
Relative Frequency Table

lists the same classes as the Frequency
Distribution table in one column, and
their corresponding relative frequencies
(percentages) in the second column.

Relative Frequency is denoted sometimes p-hat, sometimes p:

and is equal to the percent of the subjects in a class out of the
whole sample:
 pˆ  100%
f
p
n
 p  100%
Frequency and Relative Frequency
distribution tables:
28
37% 
76 30
39% 
12 76
16% 
76
 f  n  76
Relative Frequency Distribution:
Ex. 2
Given the Frequency Distribution table constructed in Example 1, create the
corresponding Relative Frequency Distribution table:
Frequency Distribution: Ages of Best Actors (1) Calculate the relative frequency in
each class by finding the ratio f/n :
Age of Actor
Frequency
29 – 36
15
37 – 44
32
45 – 52
17
53 – 60
8
Relative Frequency: Ages of Best Actors
61 – 68
3
Age of Actor
Relative Frequency
69 – 76
1
29 – 36
19.7%
n = 76
37 – 44
42.1%
45 – 52
22.4%
53 – 60
10.5%
61 – 68
3.9%
69 – 76
1.3%
(2) Write the relative frequency
values in the second column:
17
15
32
 22.4%
 19.7%
 42.1%
76
76
76
99.9% (total)
Definition
Cumulative Frequency Table

replaces both columns of the Frequency
Distribution table with cumulative groups
and cumulative frequencies.

The cumulative groups of values are found as all the values that are
less than each of the lower class limits of the original groups;

Cumulative frequencies in each such cumulative group are found by
adding the frequency in the original group to the cumulative
frequency in the previous group.
Cumulative Frequency Distribution
Cumulative
Frequencies
Frequency Tables
Critical Thinking Interpreting
Frequency Distributions
In later chapters, there will be
frequent references to data with
normal distribution. One key
characteristic of a normal distribution
is that it has a “bell” shape:
–
The frequencies start low, then increase to some
maximum frequency, then decrease to a low frequency.
–
The frequencies are distributed approximately
symmetric, nearly evenly distributed on both sides of
the maximum frequency.
Comparing Two Samples
Ex. 3
Given their corresponding Relative Frequency Distribution tables, make a
comparison of the ages of Oscar-winning actresses and actors:
Ages of Best Actresses and Actors
(3) Neither
distribution
21 – 30
37%
4%
appears to be
normal – data
31 – 40
39%
33%
values are not
41 – 50
16%
39%
distributed
51 – 60
3%
18%
symmetrically
61 – 70
3%
4%
on each side of
71 – 80
3%
1%
the maximum
101%
99%
frequencies.
(1) actresses tend to be somewhat younger than actors. 37% of the
actresses are in the youngest age group while only 4% of the actors
fall into that age.
(2) The highest relative frequency for the actresses (39%) corresponds
to the age group from 31 to 40; the highest relative frequency for actors
(39%) corresponds to the age group from 41 to 50 years old, ten years
older that the most frequent age group for actresses.
Age
Relative Frequency
for Actresses
Relative Frequency
for Actors
Recap
In this Section we have discussed
 Important characteristics of data: C V D O T
 Frequency distributions;
 Procedures for constructing frequency distributions;
 Relative frequency distributions;
 Cumulative frequency distributions;
Comparing samples using their frequency
distributions.
CARA MENENTUKAN BANYAK KELAS
RUMUS STURGES

K = 1 + 3,3 LOG n

K: Banyaknya Kelas

N: Jumlah data yang kita miliki

Kasus:

Sampel yang berupa penjualan produk suatu perusahaan
terhadap 80 pelanggan. Maka, bagaimana menentukan
jumlah kelas yang sesuai???
Answer:

Jumlah data yang dimiliki (n)= 80

Maka:
K
= 1 + 3,3 log n

= 1 + 3,3 log 80

= 1 + 3,3 (1,9031)

= 7,280 … dibulatkan menjadi 7 kelas
Section 2-3
Histograms
Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.
Key Concept
A histogram is an important type of graph
that portrays the nature of the distribution:
Relative Frequency Histogram
Has the same shape and horizontal scale as a
histogram, but the vertical scale is marked with
relative frequencies instead of actual frequencies
Key Concept
A histogram makes visible the nature of the
distribution, as well as where its center is
and whether there are any outliers:
The shape of the distribution of
the ages of Best Actresses is
skewed, heavier on the left
indicating that actresses who
win Oscars tend to be
disproportionally younger.
Normal Distribution:
One key characteristic of a normal distribution is
that it is “bell-shaped”:
Recap
In this Section we have discussed
 Histograms
 Relative Frequency Histograms
Section 2-4
Statistical Graphics
Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.
Key Concept
This section presents other graphs
beyond histograms commonly used in
statistical analysis.
The main objective is to understand a
data set by using a suitable graph that is
effective in revealing some important
characteristic.
Frequency Polygon
Uses line segments connected to points directly
above class midpoint values
Ogive
A line graph that depicts cumulative frequencies
Insert figure 2-6 from page 58
Dot Plot
Consists of a graph in which each data value is
plotted as a point (or dot) along a scale of values
Stemplot (or Stem-and-Leaf Plot)
Represents data by separating each value into two
parts: the stem (such as the leftmost digit) and the
leaf (such as the rightmost digit)
Pareto Chart
A bar graph for qualitative data, with the bars
arranged in order according to frequencies:
complaints against phone
carriers:
Pie Chart
A graph depicting qualitative data as slices of a pie
Pie Chart analysis
Ex. 1
What percent of total
complaints corresponds
to complaints due to
Access Charges?
(1) Percent is the ratio of the Access Charges
complaints out of the total number of
complaints, so we need to find the total
number of all complaints first: 21086
(2) Now the proportion of the Access Charges complaints is:
p
614
 0.029  2.9%
21086
(3) Complaints due to Access Charges make up 2.9% of all complaints against
phone carriers.
Scatter Plot (or Scatter Diagram)
A plot of paired (x,y) data with a horizontal x-axis
and a vertical y-axis: Number of cricket chirps per
minute related to the temperature:
Time-Series Graph
Data that have been collected at different points in
time
Other Graphs
Recap
In this section we have discussed graphs
that are pictures of distributions.
Keep in mind that a graph is a tool for
describing, exploring and comparing data.
1. A sample value that lies very far away from the
majority of the other sample values is
A. The center.
B. A distribution.
C. An outlier.
D. A variance.
2. A table that lists data values along with their
counts is
A. An ogive.
B. A frequency distribution.
C. A cumulative table.
D. A histogram.
3. The smallest numbers that can actually belong
to different classes are
A. Upper class limits.
B. Class boundaries.
C. Midpoints.
D. Lower class limits.
4. A bar graph where the horizontal scale
represents the classes of data values and the
vertical scale represents the frequencies is
called
A. A frequency distribution.
B. A histogram.
C. A dot plot.
D. A pie chart.
5. The pie chart below shows the percent of the total population
of 12,200 of Springfield inhabitants living in the given types of
housing. Find the number of people who live in single family
housing (to nearest whole number.)
Apartments 35%
Single family 39%
A. 4758 people
B. 39 people .
C. 5368 people
D. 7442 people
Condo 18%
Duplex 2%
Townhouse 6%
 6.
Berdasarkan table distribusi frekuensi yang
telah kalian buat sebelumnya (sertakan table
tersebut dalam lembar jawaban apa adanya !)
 a.
Tentukan jumlah kelas yang sesuai dengan
menggunakan rumus Sturges!
 b.
Buatlah graphic berdasarkan table distribusi
frekuensi anda:
 Polygon
untuk frekuensi !
 Histogram
 Ogive
untuk frekuensi relatif !
untuk frekuensi kumulatif !
ANSWER
A sample value that lies very far away from
the majority of the other sample values
is
A. The center.
B. A distribution.
C. An outlier.
D. A variance.
A sample value that lies very far away from
the majority of the other sample values
is
A. The center.
B. A distribution.
C. An outlier.
D. A variance.
A table that lists data values along with
their counts is
A. An ogive.
B. A frequency distribution.
C. A cumulative table.
D. A histogram.
A table that lists data values along with
their counts is
A. An ogive.
B. A frequency distribution.
C. A cumulative table.
D. A histogram.
The smallest numbers that can actually
belong to different classes are
A. Upper class limits.
B. Class boundaries.
C. Midpoints.
D. Lower class limits.
The smallest numbers that can actually
belong to different classes are
A. Upper class limits.
B. Class boundaries.
C. Midpoints.
D. Lower class limits.
A bar graph where the horizontal scale
represents the classes of data values
and the vertical scale represents the
frequencies is called
A. A frequency distribution.
B. A histogram.
C. A dot plot.
D. A pie chart.
A bar graph where the horizontal scale
represents the classes of data values
and the vertical scale represents the
frequencies is called
A. A frequency distribution.
B. A histogram.
C. A dot plot.
D. A pie chart.
The pie chart below shows the percent of the total
population of 12,200 of Springfield inhabitants living in the
given types of housing. Find the number of people who
live in single family housing (to nearest whole number.)
Apartments 35%
Condo 18%
A. 4758 people
B. 39 people .
C. 5368 people
D. 7442 people
Single family 39%
Duplex 2%
Townhouse 6%
The pie chart below shows the percent of the total
population of 12,200 of Springfield living in the given
types of housing. Find the number of people who live in
single family housing (round to nearest whole number.)
Apartments 35%
Condo 18%
A. 4758 people
B. 39 people .
C. 5368 people
D. 7442 people
Single family 39%
Duplex 2%
Townhouse 6%
Any Queries ?