STA 291-021 Summer 2007
Download
Report
Transcript STA 291-021 Summer 2007
Lecture 1
Dustin Lueker
Statistical terminology
Descriptive statistics
Probability and distribution functions
Inferential statistics
◦ Estimation (confidence intervals)
◦ Hypothesis testing
Simple linear regression and correlation
STA 291 Summer 2010 Lecture 1
Research in all fields is becoming more
quantitative
◦ Research journals
◦ Most graduates will need to be familiar with basic
statistical methodology and terminology
Newspapers, advertising, surveys, etc.
◦ Many statements contain statistical arguments
Computers make complex statistical methods
easier to use
STA 291 Summer 2010 Lecture 1
Many times statistics are used in an incorrect
and misleading manner
Purposely misused
◦ Companies/people wanting to further their agenda
Cooking the data
Completely making up data
Massaging the numbers
Altering values to get desired result
Accidentally misused
◦ Using inappropriate methods
Vital to understand a method before using it
STA 291 Summer 2010 Lecture 1
Statistics is a mathematical science pertaining to
the collection, analysis, interpretation or
explanation, and presentation of data
Applicable to a wide variety of academic
disciplines
◦ Physical sciences
◦ Social sciences
◦ Humanities
Statistics are used for making informed decisions
◦ Business
◦ Government
STA 291 Summer 2010 Lecture 1
•Planning research studies
•How to best obtain the required data
Design
•Assuring that our data is representational of the entire population
•Summarizing data
•Exploring patterns in the data
Description •Extract/condense information
•Make predictions based on the data
•‘Infer’ from sample to population
Inference
•Summarize results
STA 291 Summer 2010 Lecture 1
Population
◦ Total set of all subjects of interest
Entire group of people, animals, products, etc. about
which we want information
Elementary Unit
◦ Any individual member of the population
Sample
◦ Subset of the population from which the study
actually collects information
◦ Used to draw conclusions about the whole
population
STA 291 Summer 2010 Lecture 1
Variable
◦ A characteristic of a unit that can vary among
subjects in the population/sample
Ex: gender, nationality, age, income, hair color,
height, disease status, state of residence, grade in STA
291
Parameter
◦ Numerical characteristic of the population
Calculated using the whole population
Statistic
◦ Numerical characteristic of the sample
Calculated using the sample
STA 291 Summer 2010 Lecture 1
Why take a sample? Why not take a census?
Why not measure all of the units in the
population?
◦ Accuracy
May not be able to find every unit in the population
◦ Time
Speed of response from units
◦ Money
◦ Infinite Population
◦ Destructive Sampling or Testing
STA 291 Summer 2010 Lecture 1
University Health Services at UK conducts a
survey about alcohol abuse among students
◦ 200 of the students are sampled and asked to
complete a questionnaire
◦ One question is “have you regretted something you
did while drinking?”
What is the population?
What is the sample?
STA 291 Summer 2010 Lecture 1
Descriptive Statistics
◦ Summarizing the information in a collection of data
Inferential Statistics
◦ Using information from a sample to make
conclusions/predictions about the population
Ex: using a sample statistic to estimate a population
parameter
STA 291 Summer 2010 Lecture 1
The Current Population Survey of about 60,000
households in the United States in 2002
distinguishes three types of families: Marriedcouple (MC), Female householder and no
husband (FH), Male householder and no wife (MH)
It indicated that 5.3% of “MC”, 26.5% of “FH”, and
12.1% of “MH” families have annual income below
the poverty level
◦ Are these numbers statistics or parameters?
The report says that the percentage of all “FH”
families in the USA with income below the
poverty level is at least 25.5% but no greater than
27.5%
◦ Is this an example of descriptive or inferential statistics?
STA 291 Summer 2010 Lecture 1
Quantitative or Numerical
◦ Variable with numerical values associated with them
Qualitative or Categorical
◦ Variables without numerical values associated with
them
STA 291 Summer 2010 Lecture 1
Ordinal
◦ Disease status, company rating, grade in STA 291
Ordinal variables have a scale of ordered categories,
they are often treated in a quantitative manner (A =
4.0, B = 3.0, etc.)
One unit can have more of a certain property than another
unit
Nominal
◦ Gender, nationality, hair color, state of residence
Nominal variables have a scale of unordered categories
It does not make sense to say, for example, that green
hair is greater/higher/better than orange hair
STA 291 Summer 2010 Lecture 1
Quantitative
◦ Age, income, height
Quantitative variables are measured numerically, that
is, for each subject a number is observed
The scale for quantitative variables is called interval scale
STA 291 Summer 2010 Lecture 1
A study about oral hygiene and periodontal
conditions among institutionalized elderly
measured the following
◦ Nominal (Qualitative): Requires assistance from staff?
Yes
No
◦ Ordinal (Qualitative): Plaque score
No visible plaque
Small amounts of plaque
Moderate amounts of plaque
Abundant plaque
◦ Interval (Quantitative): Number of teeth
STA 291 Summer 2010 Lecture 1
A birth registry database collects the following information
on newborns
◦ Birth weight: in grams
◦ Infant’s Condition:
Excellent
Good
Fair
Poor
African-American
Caucasian
Hispanic
Native American
Other
◦ Number of prenatal visits
◦ Ethnic background:
What are the appropriate scales? Quantitative (Interval)
Qualitative (Ordinal, Nominal)
STA 291 Summer 2010 Lecture 1
Statistical methods vary for quantitative and
qualitative variables
Methods for quantitative data cannot be used to
analyze qualitative data
Quantitative variables can be treated in a less
quantitative manner
◦ Height: measured in cm/in
Interval (Quantitative)
Can be treated at Qualitative
Ordinal:
Short
Average
Tall
Nominal:
<60in or >72in
60in-72in
STA 291 Summer 2010 Lecture 1
Try to measure variables as detailed as
possible
◦ Quantitative
More detailed data can be analyzed in further
depth
◦ Caution: Sometimes ordinal variables are treated as
quantitative (ex: GPA)
STA 291 Summer 2010 Lecture 1
A variable is discrete if it can take on a finite
number of values
◦
◦
◦
◦
◦
◦
Gender
Nationality
Hair color
Disease status
Grade in STA 291
Favorite MLB team
Qualitative variables are discrete
STA 291 Summer 2010 Lecture 1
Continuous variables can take an infinite
continuum of possible real number values
◦ Time spent studying for STA 291 per day
43 minutes
2 minutes
27.487 minutes
27.48682 minutes
Can be subdivided into more accurate values
Therefore continuous
STA 291 Summer 2010 Lecture 1
Number of children in a family
Distance a car travels on a tank of gas
% grade on an exam
STA 291 Summer 2010 Lecture 1
Quantitative variables can be discrete or
continuous
Age, income, height?
◦ Depends on the scale
Age is potentially continuous, but usually measured in
years (discrete)
STA 291 Summer 2010 Lecture 1