Transcript Statistics-MAT 150 Chapter 2 Descriptive Statistics
Statistics-MAT 150
Chapter 1 Introduction to Statistics
Prof. Felix Apfaltrer
Office:N518 Phone: x7421
Chapter 1
• Overview • Nature of data • Skills needed in statistics
Overview Statistics: • Descriptive
– Analyze nature of data from surveys, experiments, observations,
• Inferential
– Draw conclusions from the analyses with respect to the population
Survey: tool to collect data from a smaller group which is part of a larger group to learn something about the larger group
Key goal of statistics: •Learn about a large group (population) from data from from a smaller subgroup (sample)
Overview
Definitions
: • Data: observations collected (measurements, gender, answers,…) • Statistics: collection of methods to analyze data • Population: complete collection of elements (scores, measurements, subjects,…) • Sample: subcollection of members from selected population • Census: collection of data from every member of the population
Overview 2
Example:
• Poll: 1087 adults are asked whether they drink alcoholic beverages or not.
– Sample: 1087 adults – Population: US adults 150 million.
• • Census: Every 10 years, the census bureau tries to collect information from
every
member of the US population.
–
Impossible!
–
Very expensive!
Use sample data to draw conclusions from whole population:
inferential statistics!
Types of data
•
Parameter:
• A numerical measurement describing some characteristic of the
population.
Lincoln elected: 39.82% of 1,865,908 votes counted.
–
39.82% is a parameter.
Statistic:
• • A numerical measurement describing some characteristic of the
sample.
Based on a sample of 877 elected executives, 45% would not hire an applicant with a typographical error in the application.
–
45% is a statistic.
Types of data 2
Quantitative data:
Numbers representing counts or measurements.
• •
Weights of supermodels.
Qualitative data:
Nonnumerical.
Gender of an athlete.
•
Discrete
vs.
continuous data
# of people in a household
vs.
temperatures in May.
•
Nominal level
of measurement: names, labels categories: no ordering.
Yes/No/Undecided responses, colors.
Ordinal level
of measurement: some order, but numerical values meaningless or nonexistent. • •
Course grades A, B, C, D, F. “Livability rank of a city”.
Interval level
of measurement: order, but “no 0” or meaningless.
Temperature, year.
•
Ratio level
of measurement: as before with meaningfull zero.
Weights, prices (non-negative).
Basic skills
Samples:
• • • representative:
“39/40 polled people vote for A” Sampled in A’s headquarters!
• Not too small:
CDF published “among HS students suspended, 67% suspended more than 3 times” Sample size: 3!
Graphs:
In which one does red do better?
Median Weekly Income (16-24) Median Weekly Income (16-24)
$390 $380 $370 $360 $350 $340 $330 $320 $310 $300 $400 $350 $300 $250 $200 $150 $100 $50 $0 Men Women Men Women
Percentage of:
• 6 % of 1200 = 6 / 100 * 1200 = 72%
Fraction >>> percentage:
• 3/4 = 0.75 >>> 0.75 * 100% = 75 %
Percentage >>> decimal:
• 27.3% = 27.3/100 = 0.273
Decimal >>> percentage:
• 0.852 >>> 0.852 * 100% = 85.2% • `
Calculator:
Basic skills 2
Design
•
Observational study:
observe and measure characteristics without trying to modify subjects.
Gallup poll.
• Cross-sectional:
data observed, measured at one point in time
.
• Retrospective:
data are collected from the past (records)
• Prospective:
data collected along the way from groups (smokers/NS)
•
Experiment:
apply treatment and observe and measure effects.
Clinical trial for Lipitor.
• Control:
blinding - placebo, double-blinding, blocks
• Replication:
ability to repeat experiment
• Randomization:
data
needs to be
collected in an
appropriate (random)
way, otherwise it is completely useless!
– –
Random sample:
members of the population are selected so that each individual member has the same chance of being selected.
Simple random sample of size n :
every possible random sample of size
n
has the same chance of being chosen.
Design 2
Sampling:
• systematic:
select starting point and every k th member chosen.
• convenience:
use easy to get data
• stratified:
subdivide population into at least 2 subgroups with common characteristic and draw samples from each (e.g. gender or age)
• cluster:
divide population into areas and draw samples form clusters
Sampling error:
the difference between a sample result and the true population result; results from chance sample fluctuations
Nonsampling error:
occurs when data is incorrectly collected, measured, recorded or analyzed.