STA 291-021 Summer 2007

Download Report

Transcript STA 291-021 Summer 2007

Lecture 1
Dustin Lueker




Statistical terminology
Descriptive statistics
Probability and distribution functions
Inferential statistics
◦ Estimation (confidence intervals)
◦ Hypothesis testing

Simple linear regression and correlation
STA 291 Summer 2010 Lecture 1

Research in all fields is becoming more
quantitative
◦ Research journals
◦ Most graduates will need to be familiar with basic
statistical methodology and terminology

Newspapers, advertising, surveys, etc.
◦ Many statements contain statistical arguments

Computers make complex statistical methods
easier to use
STA 291 Summer 2010 Lecture 1


Many times statistics are used in an incorrect
and misleading manner
Purposely misused
◦ Companies/people wanting to further their agenda
 Cooking the data
 Completely making up data
 Massaging the numbers
 Altering values to get desired result

Accidentally misused
◦ Using inappropriate methods
 Vital to understand a method before using it
STA 291 Summer 2010 Lecture 1


Statistics is a mathematical science pertaining to
the collection, analysis, interpretation or
explanation, and presentation of data
Applicable to a wide variety of academic
disciplines
◦ Physical sciences
◦ Social sciences
◦ Humanities

Statistics are used for making informed decisions
◦ Business
◦ Government
STA 291 Summer 2010 Lecture 1
•Planning research studies
•How to best obtain the required data
Design
•Assuring that our data is representational of the entire population
•Summarizing data
•Exploring patterns in the data
Description •Extract/condense information
•Make predictions based on the data
•‘Infer’ from sample to population
Inference
•Summarize results
STA 291 Summer 2010 Lecture 1

Population
◦ Total set of all subjects of interest
 Entire group of people, animals, products, etc. about
which we want information

Elementary Unit
◦ Any individual member of the population

Sample
◦ Subset of the population from which the study
actually collects information
◦ Used to draw conclusions about the whole
population
STA 291 Summer 2010 Lecture 1

Variable
◦ A characteristic of a unit that can vary among
subjects in the population/sample
 Ex: gender, nationality, age, income, hair color,
height, disease status, state of residence, grade in STA
291

Parameter
◦ Numerical characteristic of the population
 Calculated using the whole population

Statistic
◦ Numerical characteristic of the sample
 Calculated using the sample
STA 291 Summer 2010 Lecture 1

Why take a sample? Why not take a census?
Why not measure all of the units in the
population?
◦ Accuracy
 May not be able to find every unit in the population
◦ Time
 Speed of response from units
◦ Money
◦ Infinite Population
◦ Destructive Sampling or Testing
STA 291 Summer 2010 Lecture 1

University Health Services at UK conducts a
survey about alcohol abuse among students
◦ 200 of the students are sampled and asked to
complete a questionnaire
◦ One question is “have you regretted something you
did while drinking?”
 What is the population?
 What is the sample?
STA 291 Summer 2010 Lecture 1

Descriptive Statistics
◦ Summarizing the information in a collection of data

Inferential Statistics
◦ Using information from a sample to make
conclusions/predictions about the population
 Ex: using a sample statistic to estimate a population
parameter
STA 291 Summer 2010 Lecture 1


The Current Population Survey of about 60,000
households in the United States in 2002
distinguishes three types of families: Marriedcouple (MC), Female householder and no
husband (FH), Male householder and no wife (MH)
It indicated that 5.3% of “MC”, 26.5% of “FH”, and
12.1% of “MH” families have annual income below
the poverty level
◦ Are these numbers statistics or parameters?

The report says that the percentage of all “FH”
families in the USA with income below the
poverty level is at least 25.5% but no greater than
27.5%
◦ Is this an example of descriptive or inferential statistics?
STA 291 Summer 2010 Lecture 1

Quantitative or Numerical
◦ Variable with numerical values associated with them

Qualitative or Categorical
◦ Variables without numerical values associated with
them
STA 291 Summer 2010 Lecture 1

Ordinal
◦ Disease status, company rating, grade in STA 291
 Ordinal variables have a scale of ordered categories,
they are often treated in a quantitative manner (A =
4.0, B = 3.0, etc.)
 One unit can have more of a certain property than another
unit

Nominal
◦ Gender, nationality, hair color, state of residence
 Nominal variables have a scale of unordered categories
 It does not make sense to say, for example, that green
hair is greater/higher/better than orange hair
STA 291 Summer 2010 Lecture 1

Quantitative
◦ Age, income, height
 Quantitative variables are measured numerically, that
is, for each subject a number is observed
 The scale for quantitative variables is called interval scale
STA 291 Summer 2010 Lecture 1

A study about oral hygiene and periodontal
conditions among institutionalized elderly
measured the following
◦ Nominal (Qualitative): Requires assistance from staff?
 Yes
 No
◦ Ordinal (Qualitative): Plaque score




No visible plaque
Small amounts of plaque
Moderate amounts of plaque
Abundant plaque
◦ Interval (Quantitative): Number of teeth
STA 291 Summer 2010 Lecture 1

A birth registry database collects the following information
on newborns
◦ Birth weight: in grams
◦ Infant’s Condition:




Excellent
Good
Fair
Poor





African-American
Caucasian
Hispanic
Native American
Other
◦ Number of prenatal visits
◦ Ethnic background:

What are the appropriate scales? Quantitative (Interval)
Qualitative (Ordinal, Nominal)
STA 291 Summer 2010 Lecture 1



Statistical methods vary for quantitative and
qualitative variables
Methods for quantitative data cannot be used to
analyze qualitative data
Quantitative variables can be treated in a less
quantitative manner
◦ Height: measured in cm/in
 Interval (Quantitative)
 Can be treated at Qualitative
 Ordinal:
 Short
 Average
 Tall
 Nominal:
 <60in or >72in
 60in-72in
STA 291 Summer 2010 Lecture 1

Try to measure variables as detailed as
possible
◦ Quantitative

More detailed data can be analyzed in further
depth
◦ Caution: Sometimes ordinal variables are treated as
quantitative (ex: GPA)
STA 291 Summer 2010 Lecture 1

A variable is discrete if it can take on a finite
number of values
◦
◦
◦
◦
◦
◦
Gender
Nationality
Hair color
Disease status
Grade in STA 291
Favorite MLB team
 Qualitative variables are discrete
STA 291 Summer 2010 Lecture 1

Continuous variables can take an infinite
continuum of possible real number values
◦ Time spent studying for STA 291 per day




43 minutes
2 minutes
27.487 minutes
27.48682 minutes
 Can be subdivided into more accurate values
 Therefore continuous
STA 291 Summer 2010 Lecture 1



Number of children in a family
Distance a car travels on a tank of gas
% grade on an exam
STA 291 Summer 2010 Lecture 1


Quantitative variables can be discrete or
continuous
Age, income, height?
◦ Depends on the scale
 Age is potentially continuous, but usually measured in
years (discrete)
STA 291 Summer 2010 Lecture 1