Statistics-MAT 150 Chapter 2 Descriptive Statistics

Download Report

Transcript Statistics-MAT 150 Chapter 2 Descriptive Statistics

Statistics-MAT 150

Chapter 1 Introduction to Statistics

Prof. Felix Apfaltrer

[email protected]

Office:N518 Phone: x7421

Chapter 1

• Overview • Nature of data • Skills needed in statistics

Overview Statistics: • Descriptive

– Analyze nature of data from surveys, experiments, observations,

• Inferential

– Draw conclusions from the analyses with respect to the population

Survey: tool to collect data from a smaller group which is part of a larger group to learn something about the larger group

Key goal of statistics: •Learn about a large group (population) from data from from a smaller subgroup (sample)

Overview

Definitions

: • Data: observations collected (measurements, gender, answers,…) • Statistics: collection of methods to analyze data • Population: complete collection of elements (scores, measurements, subjects,…) • Sample: subcollection of members from selected population • Census: collection of data from every member of the population

Overview 2

Example:

• Poll: 1087 adults are asked whether they drink alcoholic beverages or not.

– Sample: 1087 adults – Population: US adults 150 million.

• • Census: Every 10 years, the census bureau tries to collect information from

every

member of the US population.

Impossible!

Very expensive!

Use sample data to draw conclusions from whole population:

inferential statistics!

Types of data

Parameter:

• A numerical measurement describing some characteristic of the

population.

Lincoln elected: 39.82% of 1,865,908 votes counted.

39.82% is a parameter.

Statistic:

• • A numerical measurement describing some characteristic of the

sample.

Based on a sample of 877 elected executives, 45% would not hire an applicant with a typographical error in the application.

45% is a statistic.

Types of data 2

Quantitative data:

Numbers representing counts or measurements.

• •

Weights of supermodels.

Qualitative data:

Nonnumerical.

Gender of an athlete.

Discrete

vs.

continuous data

# of people in a household

vs.

temperatures in May.

Nominal level

of measurement: names, labels categories: no ordering.

Yes/No/Undecided responses, colors.

Ordinal level

of measurement: some order, but numerical values meaningless or nonexistent. • •

Course grades A, B, C, D, F. “Livability rank of a city”.

Interval level

of measurement: order, but “no 0” or meaningless.

Temperature, year.

Ratio level

of measurement: as before with meaningfull zero.

Weights, prices (non-negative).

Basic skills

Samples:

• • • representative:

“39/40 polled people vote for A” Sampled in A’s headquarters!

• Not too small:

CDF published “among HS students suspended, 67% suspended more than 3 times” Sample size: 3!

Graphs:

In which one does red do better?

Median Weekly Income (16-24) Median Weekly Income (16-24)

$390 $380 $370 $360 $350 $340 $330 $320 $310 $300 $400 $350 $300 $250 $200 $150 $100 $50 $0 Men Women Men Women

Percentage of:

• 6 % of 1200 = 6 / 100 * 1200 = 72%

Fraction >>> percentage:

• 3/4 = 0.75 >>> 0.75 * 100% = 75 %

Percentage >>> decimal:

• 27.3% = 27.3/100 = 0.273

Decimal >>> percentage:

• 0.852 >>> 0.852 * 100% = 85.2% • `

Calculator:

Basic skills 2

Design

Observational study:

observe and measure characteristics without trying to modify subjects.

Gallup poll.

• Cross-sectional:

data observed, measured at one point in time

.

• Retrospective:

data are collected from the past (records)

• Prospective:

data collected along the way from groups (smokers/NS)

Experiment:

apply treatment and observe and measure effects.

Clinical trial for Lipitor.

• Control:

blinding - placebo, double-blinding, blocks

• Replication:

ability to repeat experiment

• Randomization:

data

needs to be

collected in an

appropriate (random)

way, otherwise it is completely useless!

– –

Random sample:

members of the population are selected so that each individual member has the same chance of being selected.

Simple random sample of size n :

every possible random sample of size

n

has the same chance of being chosen.

Design 2

Sampling:

• systematic:

select starting point and every k th member chosen.

• convenience:

use easy to get data

• stratified:

subdivide population into at least 2 subgroups with common characteristic and draw samples from each (e.g. gender or age)

• cluster:

divide population into areas and draw samples form clusters

Sampling error:

the difference between a sample result and the true population result; results from chance sample fluctuations

Nonsampling error:

occurs when data is incorrectly collected, measured, recorded or analyzed.