Transcript Document
Introduction to Statistics Chapter 1 MSIS 111 Prof. Nick Dedeke 1 Objectives Define statistics Differentiate between descriptive and inferential statistics Define statistical variables Classifying numbers 2 What is Statistics? A general way to view statistics is as follows: it is a language and the set of rules that enables us to make sense of data about events, people, places and things. 3 Valid Statistic?: Example 1 An online survey conducted recently led some to the conclusion that Apple’s iphone product will not succeed in the U.S. market. 75% of the men and 89% of the women surveyed answered “never” when asked the question: Would you buy an ipod? 4 Valid Statistic?: Example 2 When you vote consider this information. A mail survey showed that in the years when Democrats controlled the Congress, U.S. had a higher number of destructive, level 5 hurricanes. In the years that the Republicans controlled Congress, the U.S. have more days with extremely cold and extremely hot days. 5 Valid Statistic?: Example 3 If you are seeking to have a job quickly after you graduate, do not wear a clothing with a white color during your interview. A recent phone survey of fifty human resources managers at the top 10 retail firms in America revealed that only 2% of them wear white clothing to work. 6 Facts There is such a thing as bad statistics Poor methods, sample, and/or interpretation You can always make bad statistics say anything you want it to say The cure for bad statistics is good statistics 7 Do we really need statistics? Imagine a government never gathers data about population growth. Imagine a hospital that never stores data about patient data and care Imagine a car firm that never analyzes data about vehicle rollovers Imagine an insurance firm that never interprets the causes for the increases in health care costs 8 Definition of statistics? Statistics is a science dealing with the collection, organization, analysis, interpretation and presentation of quantitative and qualitative data. Statistics is a means to an end. The objective is not statistics for its own sake, it is the effective use of statistics for decision-making that matters for firms. 9 Challenge of statistics? Statistics has two primary challenges: Describing a group of entities using a segment of the group. For example, we have over 300 million U.S. citizens. I have the question to answer. How tall are Americans? This kind is called descriptive statistics. FOCUS – Present or Past Generating conclusions about future trends of a large group of data using smaller set from the same or related group. For example, I have the question: At which rate are we depleting fishes in our rivers? This kind is called inferential statistics. FOCUS – Present or Future 10 Terminologies in statistics? Census: Gathering of data from every member of a group or population, e.g. all voters in a presidential election, all subscribers to cable TV Sample: A randomly sampled set of members of a population (fraction of the size of a census) Variable: Attribute of interest of each member of group Observation or measurement: The value of a variable for a member of a group (population or sample) 11 Exercise 1: How many members are in this sample? Bill, Marty, Mary, Sue, Buba, Dub, Anne, Ali Baba, Jane, Phil, Don, Monki If I were interested in the physical attributes of the members, which two variables will I survey? If I were interested in the opinions of the sample which two variables will I survey? If I were interested in the identity of the members, which two variables will I survey? 12 Exercise 1 Responses How many members are in this sample (data set)? 12 Physical attributes: height, weight, hair color, gender Opinion: political affiliation, political worldview, Identity: last name, nationality, ID number, Soc. Sec.No. 13 Exercise 2 For each of the underlined variables write down an example of what the observation (responses to survey) would be when you survey a member of the population. Physical attributes: height, weight, hair color, gender Opinion: political affiliation, political worldview, Identity: last name, nationality, ID number, Soc. Sec.No. 14 Exercise 2 Responses Weight: 200 pounds Gender: Female Politic. affiliation: Republican Political view: Liberal Nationality: Nigerian Soc.Sec: 123974 Numerical data: Permit the use of arithmetical operations Categorical data: Permit only the building of subgroups 15 Data Measurement The question that one puts on a survey determines how a variable is measured. Consider the following questions: How much income do you make per year (in thousand $)? Do you make more than the US national average of $30,000 per year? [Yes] [No] How much income do you make per year? [Below $10k] [$10k to $30k] [$30k to $50k] [$50k to $70k] [above $70k] 16 Data Measurement Many variables could be measured at different levels. Do you make more than the US national average of $30,000 per year? [Yes] [No] Nominal level. Grouping only and ranking not advisable/ permissible How much income do you make per year? [Below $10k] [$10k to $30k] [$30k to $50k] [$50k to $70k] [above $70k] Ordinal level. Absolute zero not emphasized and ranking possible How much income do you make per year (in thousand $)? Ratio level. Absolute zero and ratio of numbers are meaningful. Arithmetical operations possible 17 Exercise 3: Data Measurement What is the level of measurements of the following observations: 1980: date of birth Social security number Temperature, e.g. 90 degrees Fahrenheit Age: 19 years old Rating of customer service: Excellent (7) 18 Exercise 3: Responses What kind of level of measurements are the following observations: 1980: date of birth [ORDINAL] Social security number [NOMINAL] Temperature, e.g. 90 degrees Fahrenheit [INTERVAL] Age: 19 years old [RATIO] Rating of customer service: Excellent (7) [ORDINAL] 19 Analyzing Data Nonparametric statistics [ORDINAL] Nonparametric statistics [NOMINAL] Parametric statistics [INTERVAL] Parametric statistics [RATIO] 20 Data Measurement: Examples Two respondents: $20,000 and $ 40,000 income/yr. Many variables could be measured at different levels. Do you make more than the US national average of $30,000 per year? [Yes] [No] Nominal level. Grouping only and ranking not advisable/ permissible. Analyses: Income class of B ranks higher than A. Difference in incomes = ??; ratio of income of class?? Not possible. How much income do you make per year? [Below $10k] [$10k to $30k] [$30k to $50k] [$50k to $70k] [above $70k] Ordinal level. Absolute zero not emphasized and ranking possible. Analyses: Income class of B ranks higher than A. Difference in income classes = ranges from $1-$40,000; ratio of income of class?? Not possible. If you divide your salary by $20,000 per year, what do you get? [¼] [½ ] [¾ ] [1] [1¼] [1½ ] 1¾ ] [2] [2¼] [2½ ] [2¾] Interval level. Absolute zero is convenient and ratio of numbers are meaningful. Analyses: Income B ranks higher than A. Difference between consecutive income classes =$5,000; ratio of income of B twice as high as A (2 divided by 1). How much income do you make per year (in thousand $)? ___________ $ thousands Ratio level. Absolute zero and ratio of numbers are meaningful. Analyses: Income B ranks higher than A. Difference in income =$20,000; income of B twice as high as A (40,000/20,000). 21