STA 291-021 Summer 2007 - University of Kentucky

Download Report

Transcript STA 291-021 Summer 2007 - University of Kentucky

Lecture 1 Dustin Lueker

       Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals) Hypothesis testing Inferential methods for two samples Simple linear regression and correlation STA 291 Fall 2009 Lecture 1

   ◦ ◦ Research in all fields is becoming more quantitative Look at research journals Most graduates will need to be familiar with basic statistical methodology and terminology Newspapers, advertising, surveys, etc.

◦ Many statements contain statistical arguments Computers make complex statistical methods easier to use STA 291 Fall 2009 Lecture 1

   Many times statistics are used in an incorrect and misleading manner ◦ Purposely misused Companies/people wanting to furthur their agenda  Cooking the data ◦  Completely making up data  Massaging the numbers Incidentally misused Using inappropriate methods  Vital to understand a method before using it STA 291 Fall 2009 Lecture 1

   Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data Applicable to a wide variety of academic disciplines ◦ ◦ ◦ ◦ ◦ Physical sciences Social sciences Humanities Statistics are used for making informed decisions Business Government STA 291 Fall 2009 Lecture 1

Design •Planning research studies •How to best obtain the required data •Assuring that our data is representational of the entire population Description •Summarizing data •Exploring patterns in the data •Extract/condense information Inference •Make predictions based on the data •‘Infer’ from sample to population •Summarize results STA 291 Fall 2009 Lecture 1

   ◦ Population Total set of all subjects of interest ◦ ◦  Entire group of people, animals, products, etc. about which we want information Elementary Unit ◦ Any individual member of the population Sample Subset of the population from which the study actually collects information Used to draw conclusions about the whole population STA 291 Fall 2009 Lecture 1

   ◦ Variable A characteristic of a unit that can vary among subjects in the population/sample ◦  Ex: gender, nationality, age, income, hair color, height, disease status, state of residence, grade in STA 291 Parameter Numerical characteristic of the population ◦  Calculated using the whole population Statistic Numerical characteristic of the sample  Calculated using the sample STA 291 Fall 2009 Lecture 1

 ◦ ◦ ◦ ◦ ◦ Why take a sample? Why not take a census? Why not measure all of the units in the population?

Accuracy  May not be able to find every unit in the population Time  Speed of response from units Money Infinite Population Destructive Sampling or Testing STA 291 Fall 2009 Lecture 1

 ◦ ◦ University Health Services at UK conducts a survey about alcohol abuse among students 200 of the students are sampled and asked to complete a questionnaire One question is “have you regretted something you did while drinking?”  What is the population? Sample?

STA 291 Fall 2009 Lecture 1

  ◦ Descriptive Statistics Summarizing the information in a collection of data ◦ Inferential Statistics Using information from a sample to make conclusions/predictions about the population STA 291 Fall 2009 Lecture 1

   ◦ The Current Population Survey of about 60,000 households in the United States in 2002 distinguishes three types of families: Married couple (MC), Female householder and no husband (FH), Male householder and no wife (MH) It indicated that 5.3% of “MC”, 26.5% of “FH”, and 12.1% of “MH” families have annual income below the poverty level ◦ Are these numbers statistics or parameters?

The report says that the percentage of all “FH” families in the USA with income below the poverty level is at least 25.5% but no greater than 27.5% Is this an example of descriptive or inferential statistics?

STA 291 Fall 2009 Lecture 1

  ◦ Univariate data Consists of observations on a single attribute ◦ Multivariate data Consists of observations on several attributes  Special case   Bivariate Data Consists of observations on two attributes STA 291 Fall 2009 Lecture 1

  ◦ Quantitative or Numerical Variable with numerical values associated with them ◦ Qualitative or Categorical Variables without numerical values associated with them STA 291 Fall 2009 Lecture 1

  ◦ Nominal Gender, nationality, hair color, state of residence  Nominal variables have a scale of unordered categories ◦  It does not make sense to say, for example, that green hair is greater/higher/better than orange hair Ordinal Disease status, company rating, grade in STA 291  Ordinal variables have a scale of ordered categories, they are often treated in a quantitative manner (A = 4.0, B = 3.0, etc.)  One unit can have more of a certain property than does another unit STA 291 Fall 2009 Lecture 1

 ◦ Quantitative Age, income, height  Quantitative variables are measured numerically, that is, for each subject a number is observed  The scale for quantitative variables is called interval scale STA 291 Fall 2009 Lecture 1

 ◦ ◦ ◦ A study about oral hygiene and periodontal conditions among institutionalized elderly measured the following  Nominal (Qualitative): Requires assistance from staff?

Yes     No Ordinal (Qualitative): Plaque score No visible plaque Small amounts of plaque Moderate amounts of plaque  Abundant plaque Interval (Quantitative): Number of teeth STA 291 Fall 2009 Lecture 1

  A birth registry database collects the following information on newborns ◦ Birthweight: in grams ◦ ◦ ◦     Infant’s Condition: Excellent Good Fair Poor Number of prenatal visits      Ethnic background: African-American Caucasian Hispanic Native American Other What are the appropriate scales? Quantitative (Interval) Qualitative (Ordinal, Nominal) STA 291 Fall 2009 Lecture 1

   Statistical methods vary for quantitative and qualitative variables Methods for quantitative data cannot be used to analyze qualitative data Quantitative variables can be treated in a less quantitative manner ◦ Height: measured in cm/in  Interval (Quantitative)  Can be treated at Qualitative        Ordinal: Short Average Tall Nominal: <60in or >72in 60in-72in STA 291 Fall 2009 Lecture 1

  ◦ Try to measure variables as detailed as possible Quantitative More detailed data can be analyzed in further depth ◦ Caution: Sometimes ordinal variables are treated at quantitative (ex: GPA) STA 291 Fall 2009 Lecture 1

 ◦ ◦ ◦ ◦ ◦ ◦ A variable is discrete if it can take on a finite number of values Gender Nationality Hair color Disease status Grade in STA 291 Favorite MLB team  Qualitative variables are discrete STA 291 Fall 2009 Lecture 1

 ◦ Continuous variables can take an

continuum infinite

of possible real number values Time spent studying for STA 291 per day     43 minutes 2 minutes 27.487 minutes 27.48682 minutes  Can be subdivided into more accurate values  Therefore continuous STA 291 Fall 2009 Lecture 1

   Number of children in a family Distance a car travels on a tank of gas % grade on an exam STA 291 Fall 2009 Lecture 1

  Quantitative variables can be discrete or continuous ◦ Age, income, height?

Depends on the scale  Age is potentially continuous, but usually measured in years (discrete) STA 291 Fall 2009 Lecture 1