Transcript Document
STATISTIKA CHATPER 2 (Summarizing and Graphing Data Summarizing and Graphing Data) 2-2 Frequency Distributions 2-3 Histograms 2-4 Statistical Graphics SULIDAR FITRI, M.Sc FEBRUARY 20,2013 Introduction To Statistics Math 13 Essentials of Statistics 3rd edition by Mario F. Triola Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Overview Important Characteristics of Data 1. Center: An average value that indicates where the middle of the data set is located. 2. Variation: A measure of spread - the amount by which the values vary among themselves. 3. Distribution: The nature (shape) of the distribution of data: bellshaped, uniform, or skewed. 4. Outliers: Data values that lie very far away from the vast majority of other values. 5. Time: Changing characteristics of the data values over time. 90 80 70 60 50 40 30 East West North 20 10 0 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr Section 2-2 Frequency Distributions Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Definition Frequency Distribution Table lists data values (individually or by groups) in one column, and their corresponding frequencies (counts) in the second column. Frequencies are denoted: F (for a population) or f (for a sample). The total sum of the frequencies in all classes must add up to the population size (or the sample size): F N f n Frequency Distribution: Ages of Best Actresses Original Data Frequency Distribution Frequency Distributions Definitions Lower Class Limit is the smallest number that can actually belong to a class. Lower Class Limits Upper Class Limit is the largest number that can actually belong to a class. Upper Class Limits Class Width is the difference between two consecutive lower class limits (or two consecutive upper class limits). Editor: Substitute Table 2-2 Class Width 10 10 10 10 10 10 Class Boundaries are the numbers used to separate classes: class boundary falls in the middle of the gap created by class limits. 20.5 30.5 Class Boundaries 40.5 50.5 60.5 70.5 80.5 Editor: Substitute Table 2-2 Class Midpoints are the numbers that fall in the middle of each class: class midpoint can be found as (lower class limit + upper class limit) 2 25.5 Class Midpoints 35.5 45.5 55.5 65.5 75.5 Reasons for Constructing Frequency Distributions 1. Large data sets can be summarized. 2. We can gain some insight into the nature of data. 3. We have a basis for constructing important graphs. Constructing A Frequency Distribution 1. Decide on the number of classes (best: between 5 and 20). 2. Calculate the class width (round up): class width (maximum data value) – (minimum data value) number of classes 3. Choose the starting point, which will be the lower limit of the first class. 4. List the lower class limits by adding the calculated class width to the lower limit of the first class. 5. List the upper class limits, which should be one less then the next lower class limit. 6. Enter a count (frequency) of data values in each class. Grouping with Equal Class widths Ex. 1 Given the ages of the male Oscar award recipients, construct a frequency distribution table with 6 classes: (1) Smallest data value = 29 Largest data value = 76 76 29 8 (2) Class width = 6 (3) Let’s choose the starting point to be 29 (4) then the lower class limits are: Age of Actor (5) then the upper class limits are: 36 ,44 ,52 ,60 ,68,76 (6) Now, tally the values in each class and fill in the 2nd column: 29 29 8 37 37 8 45 45 8 53 53 8 61 61 8 69 69 8 77 29 – 36 37 – 44 45 – 52 53 – 60 61 – 68 69 – 76 Solution: Frequency Distribution: Ages of Best Actors Age of Actor Frequency, f 29 – 36 15 37 – 44 32 45 – 52 17 53 – 60 8 61 – 68 3 69 – 76 1 n = 76 (total) Definition Relative Frequency Table lists the same classes as the Frequency Distribution table in one column, and their corresponding relative frequencies (percentages) in the second column. Relative Frequency is denoted sometimes p-hat, sometimes p: and is equal to the percent of the subjects in a class out of the whole sample: pˆ 100% f p n p 100% Frequency and Relative Frequency distribution tables: 28 37% 76 30 39% 12 76 16% 76 f n 76 Relative Frequency Distribution: Ex. 2 Given the Frequency Distribution table constructed in Example 1, create the corresponding Relative Frequency Distribution table: Frequency Distribution: Ages of Best Actors (1) Calculate the relative frequency in each class by finding the ratio f/n : Age of Actor Frequency 29 – 36 15 37 – 44 32 45 – 52 17 53 – 60 8 Relative Frequency: Ages of Best Actors 61 – 68 3 Age of Actor Relative Frequency 69 – 76 1 29 – 36 19.7% n = 76 37 – 44 42.1% 45 – 52 22.4% 53 – 60 10.5% 61 – 68 3.9% 69 – 76 1.3% (2) Write the relative frequency values in the second column: 17 15 32 22.4% 19.7% 42.1% 76 76 76 99.9% (total) Definition Cumulative Frequency Table replaces both columns of the Frequency Distribution table with cumulative groups and cumulative frequencies. The cumulative groups of values are found as all the values that are less than each of the lower class limits of the original groups; Cumulative frequencies in each such cumulative group are found by adding the frequency in the original group to the cumulative frequency in the previous group. Cumulative Frequency Distribution Cumulative Frequencies Frequency Tables Critical Thinking Interpreting Frequency Distributions In later chapters, there will be frequent references to data with normal distribution. One key characteristic of a normal distribution is that it has a “bell” shape: – The frequencies start low, then increase to some maximum frequency, then decrease to a low frequency. – The frequencies are distributed approximately symmetric, nearly evenly distributed on both sides of the maximum frequency. Comparing Two Samples Ex. 3 Given their corresponding Relative Frequency Distribution tables, make a comparison of the ages of Oscar-winning actresses and actors: Ages of Best Actresses and Actors (3) Neither distribution 21 – 30 37% 4% appears to be normal – data 31 – 40 39% 33% values are not 41 – 50 16% 39% distributed 51 – 60 3% 18% symmetrically 61 – 70 3% 4% on each side of 71 – 80 3% 1% the maximum 101% 99% frequencies. (1) actresses tend to be somewhat younger than actors. 37% of the actresses are in the youngest age group while only 4% of the actors fall into that age. (2) The highest relative frequency for the actresses (39%) corresponds to the age group from 31 to 40; the highest relative frequency for actors (39%) corresponds to the age group from 41 to 50 years old, ten years older that the most frequent age group for actresses. Age Relative Frequency for Actresses Relative Frequency for Actors Recap In this Section we have discussed Important characteristics of data: C V D O T Frequency distributions; Procedures for constructing frequency distributions; Relative frequency distributions; Cumulative frequency distributions; Comparing samples using their frequency distributions. CARA MENENTUKAN BANYAK KELAS RUMUS STURGES K = 1 + 3,3 LOG n K: Banyaknya Kelas N: Jumlah data yang kita miliki Kasus: Sampel yang berupa penjualan produk suatu perusahaan terhadap 80 pelanggan. Maka, bagaimana menentukan jumlah kelas yang sesuai??? Answer: Jumlah data yang dimiliki (n)= 80 Maka: K = 1 + 3,3 log n = 1 + 3,3 log 80 = 1 + 3,3 (1,9031) = 7,280 … dibulatkan menjadi 7 kelas Section 2-3 Histograms Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Key Concept A histogram is an important type of graph that portrays the nature of the distribution: Relative Frequency Histogram Has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies instead of actual frequencies Key Concept A histogram makes visible the nature of the distribution, as well as where its center is and whether there are any outliers: The shape of the distribution of the ages of Best Actresses is skewed, heavier on the left indicating that actresses who win Oscars tend to be disproportionally younger. Normal Distribution: One key characteristic of a normal distribution is that it is “bell-shaped”: Recap In this Section we have discussed Histograms Relative Frequency Histograms Section 2-4 Statistical Graphics Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Key Concept This section presents other graphs beyond histograms commonly used in statistical analysis. The main objective is to understand a data set by using a suitable graph that is effective in revealing some important characteristic. Frequency Polygon Uses line segments connected to points directly above class midpoint values Ogive A line graph that depicts cumulative frequencies Insert figure 2-6 from page 58 Dot Plot Consists of a graph in which each data value is plotted as a point (or dot) along a scale of values Stemplot (or Stem-and-Leaf Plot) Represents data by separating each value into two parts: the stem (such as the leftmost digit) and the leaf (such as the rightmost digit) Pareto Chart A bar graph for qualitative data, with the bars arranged in order according to frequencies: complaints against phone carriers: Pie Chart A graph depicting qualitative data as slices of a pie Pie Chart analysis Ex. 1 What percent of total complaints corresponds to complaints due to Access Charges? (1) Percent is the ratio of the Access Charges complaints out of the total number of complaints, so we need to find the total number of all complaints first: 21086 (2) Now the proportion of the Access Charges complaints is: p 614 0.029 2.9% 21086 (3) Complaints due to Access Charges make up 2.9% of all complaints against phone carriers. Scatter Plot (or Scatter Diagram) A plot of paired (x,y) data with a horizontal x-axis and a vertical y-axis: Number of cricket chirps per minute related to the temperature: Time-Series Graph Data that have been collected at different points in time Other Graphs Recap In this section we have discussed graphs that are pictures of distributions. Keep in mind that a graph is a tool for describing, exploring and comparing data. 1. A sample value that lies very far away from the majority of the other sample values is A. The center. B. A distribution. C. An outlier. D. A variance. 2. A table that lists data values along with their counts is A. An ogive. B. A frequency distribution. C. A cumulative table. D. A histogram. 3. The smallest numbers that can actually belong to different classes are A. Upper class limits. B. Class boundaries. C. Midpoints. D. Lower class limits. 4. A bar graph where the horizontal scale represents the classes of data values and the vertical scale represents the frequencies is called A. A frequency distribution. B. A histogram. C. A dot plot. D. A pie chart. 5. The pie chart below shows the percent of the total population of 12,200 of Springfield inhabitants living in the given types of housing. Find the number of people who live in single family housing (to nearest whole number.) Apartments 35% Single family 39% A. 4758 people B. 39 people . C. 5368 people D. 7442 people Condo 18% Duplex 2% Townhouse 6% 6. Berdasarkan table distribusi frekuensi yang telah kalian buat sebelumnya (sertakan table tersebut dalam lembar jawaban apa adanya !) a. Tentukan jumlah kelas yang sesuai dengan menggunakan rumus Sturges! b. Buatlah graphic berdasarkan table distribusi frekuensi anda: Polygon untuk frekuensi ! Histogram Ogive untuk frekuensi relatif ! untuk frekuensi kumulatif ! ANSWER A sample value that lies very far away from the majority of the other sample values is A. The center. B. A distribution. C. An outlier. D. A variance. A sample value that lies very far away from the majority of the other sample values is A. The center. B. A distribution. C. An outlier. D. A variance. A table that lists data values along with their counts is A. An ogive. B. A frequency distribution. C. A cumulative table. D. A histogram. A table that lists data values along with their counts is A. An ogive. B. A frequency distribution. C. A cumulative table. D. A histogram. The smallest numbers that can actually belong to different classes are A. Upper class limits. B. Class boundaries. C. Midpoints. D. Lower class limits. The smallest numbers that can actually belong to different classes are A. Upper class limits. B. Class boundaries. C. Midpoints. D. Lower class limits. A bar graph where the horizontal scale represents the classes of data values and the vertical scale represents the frequencies is called A. A frequency distribution. B. A histogram. C. A dot plot. D. A pie chart. A bar graph where the horizontal scale represents the classes of data values and the vertical scale represents the frequencies is called A. A frequency distribution. B. A histogram. C. A dot plot. D. A pie chart. The pie chart below shows the percent of the total population of 12,200 of Springfield inhabitants living in the given types of housing. Find the number of people who live in single family housing (to nearest whole number.) Apartments 35% Condo 18% A. 4758 people B. 39 people . C. 5368 people D. 7442 people Single family 39% Duplex 2% Townhouse 6% The pie chart below shows the percent of the total population of 12,200 of Springfield living in the given types of housing. Find the number of people who live in single family housing (round to nearest whole number.) Apartments 35% Condo 18% A. 4758 people B. 39 people . C. 5368 people D. 7442 people Single family 39% Duplex 2% Townhouse 6% Any Queries ?