Transcript Document

Introductory Statistics – Part 1

By

Samuel Chukwuemeka (Samdom For Peace) www.samdomforpeace.com

Terms

 

Statistics

Sample

Population

Individual

Statistic

Parameter

Statistical Process

Data

Variables

     

Statistics

 Is the science that deals with the Collection Organization Presentation Analysis and Interpretation of data so as to make a conclusion or decision    We have basically, two types of statistics: Descriptive Statistics and Inferential Statistics

Descriptive Statistics and Inferential Statistics

 Descriptive Statistics is the science that deals with the organization and presentation of the collected data  Inferential Statistics is the science that uses methods that takes the results obtained from a sample, infers it on the population, and measures the reliability of the results.

 We shall discuss more of this as we move on.

    

Why do we learn Statistics?

 The media uses statistics to predict election polls such as the Presidential Election; nominate people for awards, and so on.

Administrators use statistics to know how their district and schools are performing, and so make decisions as necessary Health professionals use statistics to know how different people react to different medicines Teachers use statistics to know how to meet the individual student learning needs You use statistics to make informed decisions on who to marry, what car to buy, what school to attend, among others…….and the uses go on and on and on…

        

Data

 Data is the list of observed values for a variable Data is the fact used make a conclusion or decision It is also referred to as “Information” It is collected from a survey, an experiment, a historical record, among others It can be numeric such as age, weight, etc. It can also be non-numeric such as color, gender, etc.

Data vary. It changes within an individual. It also changes among individuals.

Understanding the variability of data is very important in Statistics Collecting data about something involves a study of that thing. This study could be measured or observed. That leads us to…

Population, Sample, and Individual

  Population refer to the entire group of individuals or things that is being studied. It contains all subjects of interest.  Sample refer to a subset (that is part of the population or some members of the population) that is being studied. It contains some of the subjects of interest.

 An Individual refer to a member of the population that is being studied. It is that subject of interest.

Example 1

 For each of these scenarios, identify the population, the sample, and the individual.

 A 2012 survey of 100 million Nigerians in Nigeria found that they would prefer the South to secede from the North.

   Population: Sample: Individual: All Nigerians in Nigeria 100 million Nigerians in Nigeria A Nigerian

Example 2

 A poll contacts 200 ladies aged 19 to 35 and live in the U.S and asks whether they use abstinence as a form of birth control.

 Population: Ladies aged 19 to 35 and live in the U.S

 Sample: 200 ladies aged 19 to 35 and live in the U.S

 Individual: A lady aged 19 to 35 and lives in the U.S

Example 3

 A farmer randomly sampled 125 plants in his farm on June 27 and weighed the chlorophyll in each plant.

 Population: All plants in his farm on June 27  Sample: 125 plants in his farm on June 27  Individual: A plant in his farm on June 27

   

Consider Example 1

 Assuming 95 million Nigerians out of 100 million Nigerians said that they were ready to secede immediately. This means that 95% of the 100 million Nigerians that were surveyed are ready for the secession immediately.

This describes the results of the sample without making any conclusions about the population. (Descriptive Statistic). Note that the population here is the entire Nigerian population This leads us to …

Statistic, Parameter

 A statistic is a numerical summary of a sample  In our example, the 95% is the statistic  Suppose we now take this 95% and extend it to the entire Nigerian Population. Assuming we now say that 95% of all Nigerians in Nigeria said they were ready to secede immediately, then the  95% becomes the parameter  A parameter is a numerical summary of a population  We just performed Inferential Statistics

Example 4

 For each of these scenarios, determine whether the underlined value is a statistic or parameter.

 A sample of London residents were surveyed and it was found that 85% had a bachelors degree or higher  85% is a statistic because it is the numerical summary of the sample of London residents

Example 5

26 of the 50 states in the United States voted for Barack Obama in the 2012 Presidential Elections  26 is a statistic because it is the numerical summary of a sample of the states in the United States  50 is a parameter because it is the numerical summary of the population of states in the United States

Example 6

 A recent study from Harvard University researchers found that of 93,600 women aged between 25 and 42, three or more servings of berries per week may slash the risk of a heart attack by 33%. (Source: Journal of the American Heart Association)  33% is a statistic because it is the numerical value of a sample of the women aged between 25 and 42. (93,600)

  

Statistical Process

 Statistics is a science because its process follows the scientific methods. Here are the basic steps of a statistical process: Identify the research objective: What do you want to find out about? What are the necessary questions to be asked? What is the population of the study?

Collect the data needed to answer the questions: Use appropriate data collection techniques. Gaining access to an entire population is usually difficult. So, a sample is needed. How random and how large is your sample size?

Statistical Process (contd.)

 Describe the data: Obtain a descriptive statistics of your sample data. Organize and present or summarize your data properly  Perform inference: Apply appropriate techniques to extend the results of your sample data to the population of your study. Report a level of reliability of the results. What is the confidence level of your results? What is the margin of error?

 Once a research objective is stated and the population is identified, the researcher must create a list of information of the individuals of the population. That leads to …

    

Variables

 A variable is a characteristic of the individual of the population being studied.

As the name implies, it always varies Variables can be classified as: Qualitative or Categorical Variables AND Quantitative Variables    Quantitative Variables can then be classified as: Discrete Variables Continuous Variables

  

Qualitative and

 Qualitative Variables express qualitative attributes of the individuals of a population. It is not measurable. It is not usually a numeric value. It is also known as Categorical Variables. Examples are gender, favorite color, religion, street names, zip codes, etc.

Quantitative Variables express numerical measures of the individuals of a population. It is always measurable and always has a numeric value. Examples include age of students, area of a room, volume of a box, temperature, weight, height, shoe size, etc.

Let us now look at the types of Quantitative Variables – Discrete Variables and Continuous Variables

Discrete Variables

 Discrete Variables are quantitative variables that has a finite or countable number of possible values. If you count to get the value of a quantitative variable, then it is discrete. Examples are number of prime numbers obtained after tossing two dice one time, number of kings in a deck of cards, number of U.S senators, among others.  NOTE: If you can count it physically, then it is a discrete variable

Continuous Variables

 Continuous Variables are quantitative variables that has an infinite or uncountable number of possible values. If you measure to get the value of a quantitative variable, then it is continuous. Examples are: the time it takes for the “sequester” to take effect, the distance between Nigeria and United States, among others.   NOTE: If you can measure it rather than count it, then it is a continuous variable On the sidelines, we also have…

    

Dependent and

 Dependent Variables are: Also known as the Response Variables Variables that are predicted Outcome of a study The “y-value” function     Independent Variables are: Also known as the Explanatory or Predictor Variables Variables that explains the response variables The “x-value” function   In Algebra and Calculus, we note that: y = f(x)

    

Data and Variables

 The type of variable dictates the methods that can be used to analyze the data Qualitative data are observations corresponding to a qualitative variable Quantitative data are observations corresponding to a quantitative variable Discrete data are observations corresponding to a discrete variable Continuous data are observations corresponding to a continuous variable

Level of Measurement of a Variable

     The level of measurement of a variable determines the types of descriptive and inferential statistics that may be applied to a variable It is an important factor in determining what tools may be used to describe the variable, and what means of analysis to use for inference about the variable Rather than classify a variable as qualitative or quantitative, we can assign a level of measurement to the variable The levels of measurement of a variable include: Nominal, Ordinal, Interval, and Ratio Variables

Nominal Level

 A variable is at the nominal level of measurement if the variable name, label, or categorize, or coded. Order of ranking is not relevant. Examples include:  Race (African-American, European-American, etc)  Nationality (Nigeria, United States, etc)  Religion (Christianity, Islam, Hinduism, etc)  Marital Status (Single, Married, etc)  and so on, and so forth

      

Ordinal Level

 A variable is at the ordinal level of measurement if it has the properties of the nominal level of measurement but in which the order of ranking is relevant. Examples include: Likert Scales (Strongly Agree, Agree, Neutral, Disagree, Strongly Disagree) Grades (A, B, C, D, F, etc) Rankings (1 st , 2 nd , 3 rd , 4 th , five stars, four stars, etc) Levels (High, Medium, Low, etc) Thumbs up, Thumbs down and so on, and so forth

       

Interval Level

 A variable is at the interval level of measurement if it has the properties of the ordinal level of measurement but in which the difference between the values of the variable is measureable and meaningful. Arithmetic operations of addition and subtraction can be performed on these values. There is no zero starting point Because there is zero starting point, the ratios of data values are meaningless Examples include: Calendar dates Celsius and Fahrenheit temperatures and so on, and so forth

       

Ratio Level

 A variable is at the ratio level of measurement if it has the properties of the interval level of measurement but in which there is a zero starting point, and the ratios of data values are meaningful . Arithmetic operations of multiplication and division can be performed on these values. Examples include: Weights of people Kelvin temperatures Time between the deposit and the clearance of a check Volume of water used by a household in a day and so on, and so forth

Example 7

 Identify the individuals, variables and their corresponding data, and the type of variable in the table below:

Participants

A B C D E

Weight (lb.) Type

170 Athletic 250 120 Muscular Athletic 100 300 Skinny Obese

Price ($)

20 50 15 10 95

         

Solution

 Individuals are the Participants A, B, C, D, and E Variables are Weight (lb.), Type, and Price ($) Variables and their corresponding data are: Weight (lb.) [170, 250, 120, 100, 300] Type (Athletic, Muscular, Athletic, Skinny, Obese) Price ($) [20, 50, 15, 10, 95] Variables and the types of variables are: Weight (lb.) is a continuous variable Type is a qualitative variable Price ($) is discrete variable

     

Example 8

 A study looked at the impact of berries consumption in women. Of the 93,600 women aged 25 to 42 involved in the study, it found that three or more servings of berries per week may slash the risk of a heart attack by 33%. Assume the study was done with a margin of error of 5% and a 95% confidence level. What is the research objective?

Identify the population Identify the sample List the descriptive statistics What can be inferred from the study?

    

Solution

 The research objective is to determine the effect of berries consumption in reducing the risk of heart attack in women The population is all women aged 25 to 42 The sample is the 93,600 women aged 25 to 42 The descriptive statistics is: “it found that three or more servings of berries per week may slash the risk of a heart attack by 33%. “ It can be inferred that the study is 95% certain that three or more servings of berries per week may slash the risk of a heart attack between 28% and 38%.

 www.samdomforpeace.com