Introduction

Download Report

Transcript Introduction

Introduction to Statistics
Problems in Statistics
A company took the blood pressure of 1000 people of
various ages to see if blood pressure increases with age.
The weather forecast predictions were compared with
the actual weather to see how accurate weather
predictions are.
A pollster interviews a certain number of voters to predict
who will win an upcoming election.
A city planning employee records the number of cars
that pass through an intersection every hour to
determine if a light should be placed there.
What is Statistics?
Statistics is the science of collecting,
simplifying, and describing data, as well as
making inferences (drawing conlusions)
based on the analysis of data.
Data values or observations are the raw
materials of statistics. They are numbers
in context e.g. the number of those polled
ages 30-49 with blood pressure 91 or the
number of cars passing through the
intersection at 3:00 pm
1.1 Displaying
Distributions with
Graphs
Viewing Data
For all intents and (intensive?) purposes,
data is meaningless if it cannot be
interpreted.
We present several ways to “see” the
data.
Depending on the data, some ways of
displaying the data are more beneficial
than others.
Consider the following
“data”
No context, no unitsthe data is
meaningless.
1280
1812
3509
602
934
3596
1550
207
2601
324
1642
1817
Industry description in Montgomery County, PA
http://factfinder.census.gov/servlet/GQRTable?_bm=y&-geo_id=05000US42091&ds_name=EC0200A1&-_lang=en
Number of
establishments
Manufacturing
1280
Wholesale trade
1812
Retail Trade
3509
Information
602
Real estate & rental & leasing
934
Professional, scientific, & technical services
3596
Administrative & support & waste management &
remediation service
1550
Educational services
207
Health care & social assistance
2601
Arts, entertainment, & recreation
324
Accommodation & food services
1642
Other services (except public administration)
1817
This gives a context to the data, but it might not
give any kind of insight.
Things to look for
Shape
Center
Spread
Outliers
Symmetric, skewed to the right or left
Not all of these will be applicable to all
graphical displays.
Bar Graph
Number of Establishments
4000
3500
3000
2500
2000
1500
1000
500
0
Number of
Establishments
Pie Chart
Number of Establishments
Manufacturing
Wholesale trade
Retail trade
Information
Real estate & rental & leasing
Professional, scientific, & technical
services
Administrative & support & waste
management & remediation service
Educational services
Health care & social assistance
Arts, entertainment, & recreation
Accommodation & food services
Other services (except public
administration)
Stem and Leaf Plots
With bar graph and pie chart, we were interested
in both the value and the identity of the object
which gave that value.
This information may sometimes be either
superfluous or confidential.
Consider the midterm grades of a class I taught
years ago.
81, 89, 82, 82, 79, 85, 76, 54, 75, 75, 78, 71, 83,
88, 52, 86, 89, 89, 84, 79, 80, 85.
Stem and Leaf
Stem
Leaf
5
24
6
7
1556899
8
0122345568999
9
This data skews to the
right and clusters in the
70-89 range.
Should 52 and 54 both
be considered outliers?
Histogram
Unlike a bar graph which displays
categorical data, a histogram displays
numerical data.
We may consider GPA distribution of 20
students with GPAs 3.1, 2.7, 3.2, 2.9, 2.8,
3.1, 3.3, 2.8, 2.9, 3.2, 2.5, 3.9, 3.8, 2.4,
2.7, 2.8, 3.9, 2.6, 3.1, and 3.1
Histogram
Grade Distribution
5
Frequency
4
3
2
1
0
2.4
2.5
2.6
2.7
2.8
2.9
3.0
3.1
3.2
Grades
3.3
3.4
3.5
3.6
3.7
3.8
3.9
4.0
Time Plots
A time plot plots an observation against
the time it was measured.
A pattern that repeats itself at regular
intervals is a seasonal variation.
We can graph the working hours per week
over the years in the United States
(www.gapminder.org)
Hours worked per week in US by Year since 1980
35.6
35.4
35.2
35
34.8
34.6
34.4
34.2
34