Transcript Unit 1.1

Unit 1.1
Investigating Data
1
Frequency and Histograms
CCSS:
S.ID.1 Represent data with plots on the real number line (dot
plots, histograms, and box plots).
Also N.Q.1
2
Types of Data Graphs
•
•
•
•
•
Dot Plots
Frequency Tables
Histograms
Box-and-whisker plots
2-way tables
3
Dot Plots
One dot
represents one
occurrence of
the item.
Sometimes an X
is used instead
of a dot. These
plots are
sometimes
called line plots.
4
Creating a Dot Plot
• Find the least and greatest value in a data set.
• Use these values to draw a number line.
• For each piece of data, draw a dot above the
number line that corresponds to the data.
5
Frequency Tables
Frequency tables
show the number
of times something
occurs in a given
interval. From this
chart, we don’t
have individual
data, just numbers
in each group.
6
Histograms
Data is
continuous
numerical data
(range). Bars
touch each
other.
7
Histograms
• Bar graph used to display the frequency of
data divided into equal intervals
• Bars must be equal width and should touch,
but not overlap
8
Steps to Make a Histogram
•
•
•
•
Make a frequency table
Use scale and intervals from table
Draw a bar for the number in each interval
Title the graph and label axes
9
Shape of Histograms
10
Measures of Central Tendency and
Dispersion
CCSS:
S.ID.2 Use statistics appropriate to the data distribution to
compare center (median, mean) and spread (interquartile
range, standard deviation) of two or more different data sets.
Also S.ID.3, N.Q.2
11
Measures of Central Tendency
Central tendency – where the “center” of the
data is.
Mean ( x )– numerical average of the data
Mode – most frequent number in the data
Median – middle number of the data if put into
numerical order from lowest to highest
12
Measures of Dispersion
Dispersion - How spread out the data is.
 Range – difference between the maximum
value and minimum value of the data
 Standard deviation – measure of how values
in a data set vary (deviate) from the mean.
13
Standard Deviation
• Symbol: σ
• Calculation:
1. Find the mean of the data
2. Find the difference of each item from the
mean.
3. Square the differences.
4. Find the average of the differences.
5. Take the square root.
14
Example of Calculating Std. Dev.
Data : 12.6, 15.1, 11.2, 17.9, 18.2
X
X-bar
12.6
15
15.1
15
11.2
15
17.9
15
18.2
15
X – (X-bar) (X – X-bar)2
Note: x-bar stands
for the mean of data
Average of the difference of the squares:
Square root of the averages (σ):
15
Interpreting the Standard Deviation
• As the data becomes more widely distributed,
the standard deviation increases.
• A small standard deviation means that the
data are clustered tightly around the mean.
16
Box-and-Whisker Plots
CCSS:
S.ID.2 Use statistics appropriate to the data distribution to compare
center (median, mean) and spread (interquartile range, standard
deviation) of two or more different data sets.
Also N.Q.1, S.ID.1
17
Box-and-Whisker Plot
Graph that summarizes a set of data by
displaying it along a number line. It
consists of a box and two whiskers.
18
Box-and-whisker Plot
• Comprised of 5 numbers (sometimes called the 5-number summary):
 Min – minimum value (left whisker)
 Q1 – median of lower half of data (left side of box)
 Median (Q2) (middle line)
 Q3 – median of upper half of data (right side of box)
 Max – maximum value (right whisker)
19
Quartiles
• Quartiles – values that divide a data set into 4
equal parts.
• The middle half of the data (Q3 – Q1) is called the
interquartile range or IQR. (contained in the box)
•
•
•
•
From Min to Q1 – 25% of data
From Q1 to median – 25% of data
From Median to Q3 – 25% of data
From Q3 to Max – 25% of data
20
Interpreting Box Plots
• Shows middle of data, range (spread) of data,
extreme values. Does not show individual data
or mean (average).
• Outlier – a data value that is much higher or
much lower than other values in the data set.
• Percentile rank – percent of data values that
are ≤ that value.
21
Two-way Tables
CCSS:
S.ID.5 Summarize categorical data for two categories in twoway frequency tables. Interpret relative frequencies in the
context of the data (including joint, marginal, and conditional
relative frequencies).
22
2-way Tables
Way of organizing data to show data that pertain to two
different categories.
Can find the conditional probability of events occurring.
23
2-way Tables (cont’d)
1. What is the probability that if a student plays a sport, he
also takes a foreign language?
2. What is the probability that if a student doesn’t take a
foreign language, she doesn’t play a sport?
3. What is the probability that a student doesn’t take a
foreign language?
24
More 2-way tables
1. What is the probability that a student has a MP3 player?
2. What is the probability that if a student doesn’t have an
MP3 player, he has a cell phone?
3. What is the probability that if a student doesn’t have a
cell phone, he has a MP3 player?
25