Transcript Slide 1
EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 2 Describing Data: Graphical Chap 2-1 Chapter Goals After completing this chapter, you should be able to: Identify types of data and levels of measurement Create and interpret graphs to describe categorical variables: bar chart, pie chart Create a line chart to describe time-series data Create and interpret graphs to describe numerical variables: Histogram Construct and interpret graphs to describe relationships between variables Describe appropriate and inappropriate ways to display data graphically Chap 2-2 Types of Data Data Categorical Numerical Examples: Marital Status Are you registered to vote? Eye Color (Defined categories or groups) Discrete Examples: Number of Children Defects per hour (Counted items) Continuous Examples: Weight Voltage (Measured characteristics) Chap 2-3 Measurement Levels Differences between measurements, true zero exists Ratio Data Quantitative Data Differences between measurements but no true zero Interval Data Ordered Categories (rankings, order, or scaling) Ordinal Data Qualitative Data Categories (no ordering or direction) Nominal Data Chap 2-4 Graphical Presentation of Data Data in raw form are usually not easy to use for decision making Some type of organization is needed Table Graph The type of graph to use depends on the variable being summarized Chap 2-5 Graphical Presentation of Data (continued) Techniques reviewed in this chapter: Categorical Variables • Bar chart • Pie chart • Pareto diagram Numerical Variables • Line chart • Histogram • Scatter plot Chap 2-6 Tables and Graphs for Categorical Variables Categorical Data Graphing Data Bar Chart Pie Chart Chap 2-7 The Frequency Distribution Table Summarize data by category Example: Hospital Patients by Unit Hospital Unit Cardiac Care Emergency Intensive Care Maternity Surgery Number of Patients 1,052 2,245 340 552 4,630 (Variables are categorical) Chap 2-8 Bar and Pie Charts Bar charts and Pie charts are often used for qualitative (category) data Height of bar or size of pie slice shows the frequency or percentage for each category Chap 2-9 Bar Chart Example Hospital Patients by Unit 5000 4000 3000 2000 1000 Surgery Maternity Intensive Care 0 Emergency 1,052 2,245 340 552 4,630 Cardiac Care Cardiac Care Emergency Intensive Care Maternity Surgery Number of Patients Number of patients per year Hospital Unit Chap 2-10 Pie Chart Example Hospital Unit Cardiac Care Emergency Intensive Care Maternity Surgery Number of Patients % of Total 1,052 2,245 340 552 4,630 11.93 25.46 3.86 6.26 52.50 Hospital Patients by Unit Cardiac Care 12% Surgery 53% (Percentages are rounded to the nearest percent) Emergency 25% Intensive Care 4% Maternity 6% Chap 2-11 Pareto Diagram Used to portray categorical data A bar chart, where categories are shown in descending order of frequency A cumulative polygon is often shown in the same graph Used to separate the “vital few” from the “trivial many” Chap 2-12 Pareto Diagram Example Example: 400 defective items are examined for cause of defect: Source of Manufacturing Error Number of defects Bad Weld 34 Poor Alignment 223 Missing Part 25 Paint Flaw 78 Electrical Short 19 Cracked case 21 Total 400 Chap 2-13 Pareto Diagram Example (continued) Step 1: Sort by defect cause, in descending order Step 2: Determine % in each category Source of Manufacturing Error Number of defects % of Total Defects Poor Alignment 223 55.75 Paint Flaw 78 19.50 Bad Weld 34 8.50 Missing Part 25 6.25 Cracked case 21 5.25 Electrical Short 19 4.75 Total 400 100% Chap 2-14 Pareto Diagram Example (continued) Step 3: Show results graphically 60% 100% 90% 50% 80% 70% 40% 60% 30% 50% 40% 20% 30% 20% 10% 10% 0% cumulative % (line graph) % of defects in each category (bar graph) Pareto Diagram: Cause of Manufacturing Defect 0% Poor Alignment Paint Flaw Bad Weld Missing Part Cracked case Electrical Short Chap 2-15 Graphs for Time-Series Data A line chart (time-series plot) is used to show the values of a variable over time Time is measured on the horizontal axis The variable of interest is measured on the vertical axis Chap 2-16 Line Chart Example Magazine Subscriptions by Year 350 Thousands of subscribers 300 250 200 150 100 50 0 2006 2005 2004 2003 2002 2001 2000 1999 1998 1997 1996 1995 1994 1993 1992 1991 1990 Chap 2-17 Graphs to Describe Numerical Variables Numerical Data Frequency Distributions and Cumulative Distributions Histogram Chap 2-18 Histogram A graph of the data in a frequency distribution is called a histogram The interval endpoints are shown on the horizontal axis the vertical axis is either frequency, relative frequency, or percentage Bars of the appropriate heights are used to represent the number of observations within each class Chap 2-19 Histogram Example Interval Frequency Histogram : Daily High Tem perature 3 6 5 4 2 7 5 5 4 4 3 3 2 2 1 (No gaps between bars) 6 6 Frequency 10 but less than 20 20 but less than 30 30 but less than 40 40 but less than 50 50 but less than 60 0 0 0 0 0 10 10 2020 30 30 40 40 50 50 60 60 70 Temperature in Degrees Chap 2-20 Histograms in Excel 1 Select Tools/Data Analysis Chap 2-21 Histograms in Excel (continued) 2 Choose Histogram ( Input data range and bin range (bin range is a cell 3 range containing the upper interval endpoints for each class grouping) Select Chart Output and click “OK” Chap 2-22 Questions for Grouping Data into Intervals 1. How wide should each interval be? (How many classes should be used?) 2. How should the endpoints of the intervals be determined? Often answered by trial and error, subject to user judgment The goal is to create a distribution that is neither too "jagged" nor too "blocky” Goal is to appropriately show the pattern of variation in the data Chap 2-23 How Many Class Intervals? Many (Narrow class intervals) 3 2.5 2 1.5 1 0.5 60 Temperature Few (Wide class intervals) may compress variation too much and yield a blocky distribution can obscure important patterns of variation. 12 10 Frequency 8 6 4 2 0 0 30 60 More Temperature (X axis labels are upper class endpoints) Chap 2-24 More 56 52 48 44 40 36 32 28 24 20 16 8 0 4 may yield a very jagged distribution with gaps from empty classes Can give a poor indication of how frequency varies across classes 12 3.5 Frequency Distribution Shape The shape of the distribution is said to be symmetric if the observations are balanced, or evenly distributed, about the center. Symmetric Distribution Frequency 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 Chap 2-25 Distribution Shape (continued) The shape of the distribution is said to be skewed if the observations are not symmetrically distributed around the center. Positively Skewed Distribution 12 10 Frequency A positively skewed distribution (skewed to the right) has a tail that extends to the right in the direction of positive values. 8 6 4 2 0 1 3 4 5 6 7 8 9 7 8 9 Negatively Skewed Distribution 12 10 Frequency A negatively skewed distribution (skewed to the left) has a tail that extends to the left in the direction of negative values. 2 8 6 4 2 0 1 2 3 4 5 6 Chap 2-26 Relationships Between Variables Graphs illustrated so far have involved only a single variable When two variables exist other techniques are used: Categorical (Qualitative) Variables Numerical (Quantitative) Variables Cross tables Scatter plots Chap 2-27 Scatter Diagrams Scatter Diagrams are used for paired observations taken from two numerical variables The Scatter Diagram: one variable is measured on the vertical axis and the other variable is measured on the horizontal axis Chap 2-28 Scatter Diagram Example Cost per day 23 125 26 140 29 146 33 160 38 167 42 170 50 188 55 195 60 200 Cost per Day vs. Production Volume 250 Cost per Day Volume per day 200 150 100 50 0 0 10 20 30 40 50 60 70 Volume per Day Chap 2-29 Scatter Diagrams in Excel 1 Select the chart wizard 2 Select XY(Scatter) option, then click “Next” 3 When prompted, enter the data range, desired legend, and desired destination to complete the scatter diagram Chap 2-30 Graphing Multivariate Categorical Data Side by side bar charts C o m p arin g In vesto rs S avings CD B onds S toc k s 0 10 Inves tor A 20 30 Inves tor B 40 50 60 Inves tor C Chap 2-31 Side-by-Side Chart Example Sales by quarter for three sales territories: East West North 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 20.4 27.4 59 20.4 30.6 38.6 34.6 31.6 45.9 46.9 45 43.9 60 50 40 East West North 30 20 10 0 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr Chap 2-32 Data Presentation Errors Goals for effective data presentation: Present data to display essential information Communicate complex ideas clearly and accurately Avoid distortion that might convey the wrong message Chap 2-33 Data Presentation Errors (continued) Unequal histogram interval widths Compressing or distorting the vertical axis Providing no zero point on the vertical axis Failing to provide a relative basis in comparing data between groups Chap 2-34 Chapter Summary Reviewed types of data and measurement levels Data in raw form are usually not easy to use for decision making -- Some type of organization is needed: Table Graph Techniques reviewed in this chapter: Frequency distribution Bar chart Pie chart Pareto diagram Line chart Frequency distribution Histogram Scatter plot Side-by-side bar charts Chap 2-35 Which of the following variables is an example of a categorical variable? A. The amount of money you spend on eating out each month. B. The time it takes you to write a test. C. The geographic region of the country in which you live. D. The weight of a cereal box. Chap 2-36 The data in the time series plot below represents monthly sales for two years of beanbag animals at a local retail store (Month 1 represents January and Month 12 represents December). Do you see any obvious patterns in the data? Explain. This is a representation of seasonal data. There seems to be a small increase in months 3, 4, and 5 and a large increase at the end of the year. The sales of this item seem to peak in December and have a significant drop off in January. Chap 2-37 At a large company, the majority of the employees earn from $20,000 to $30,000 per year. Middle management employees earn between $30,000 and $50,000 per year while top management earn between $50,000 and $100,000 per year. A histogram of all salaries would have which of the following shapes? a. Symmetrical b. Uniform c. Skewed to right d. Skewed to left Chap 2-38