Chapter 8: Data Management

Download Report

Transcript Chapter 8: Data Management

Chapter 8: Data
Management
This chapter starts on page 366.
1
Chapter 8: Get Ready

1.
2.
3.
4.
Before starting Chapter 8, we need to
review these concepts:
Display data
Box and Whisker plots
Measures of central tendency
Interpolate and Extrapolate
2
8.1: Scatter plots

Statistics Canada collects and organizes
data to help Canadians better understand
their country: its population, resources,
economy, society and culture.
3
Scatter plots


A scatter plot is a graph of ordered pairs of
numeric data.
A scatter plot is used to see relationships
between 2 variables or quantities.
4
The line of best fit

The line of best fit is the line that passes
through or near as many points as
possible on a scatter plot.
5
An outlier
An outlier is a data point that does not fit
the pattern of the other data.
 An outlier seems to be very different from
most of the data in the scatter plot.

6
Interpolating values
 Interpolating
data values from a
graph means to estimate values
between two known pieces of data.
7
Extrapolating data
 Extrapolating
data values from a
graph means to predict values
beyond the collected data.
8
The Independent variable



In a relation, the independent variable is the
variable that determines the value of the
dependent variable.
For example, with speed, distance/time, the
time is the independent variable because the
distance depends on time.
Usually, the independent variable is x
9
The Dependant variable



In a relation, the dependent variable is the
variable whose value is determined by the
independent variable.
For example, with speed, distance/time, the
distance is the dependent variable because the
distance depends on the time for its value.
Usually, the dependent variable is y
10
The types of data

There are 2 types of
data:
1.
2.
Continuous data
Discrete data
11
Continuous data



Continuous data is a set of data where a
variable can be any real number.
When the data points are joined together
as a line, this represents continuous data.
Examples of continuous data are speed
and temperature.
12
Discrete data



Discrete data is a set of data where a
variable must be a whole number.
When the data points are not joined
together as a line, this represents discrete
data.
Examples of discrete data are the number
of pages in a book or the number of
students in a class.
13
Correlations #1

To better understand and organize a data
set, Statistics Canada create scatter plots
in order to determine a correlation between
2 variables.
14
Correlations #2

A correlation is the measure of how closely
the points on a scatter plot fit a line (i.e. the
degree to which 2 quantities show a linear
relationship)
15
Adjectives that describe correlations

The correlation
between 2 variables
can be:





Strong
Weak
Positive
Negative
Non-existent
16
Strong correlation

If most of the points are closely grouped
around the line, then the correlation is
strong.
17
Weak Correlation

If the points are spread out but show a
general trend, then the correlation is weak.
18
Positive correlation
A positive correlation means that the
relationship between the variables is
positive.
 As the independent variable increases,
the dependent variable increases.
 The slope of a line showing positive
correlation is positive (the line goes up
as you move left to right)

19
Negative correlation
A negative correlation means that the
relationship between the variables is
negative.
 As the independent variable increases,
the dependent variable decreases.
 The slope of a line showing negative
correlation is negative (the line goes
down as you move left to right)

20
Non-existent correlation

If the points are spread out and show no
general trend, then the correlation is nonexistent.
21
A relationship
 A relationship
is a pattern between 2
sets of numbers.
22
The types of relationships

In Data
Management, there
are two types of
Math relationships:


A linear relationship
(it forms a straight
line)
A non-linear
relationship (it does
not form a straight
line)
23
8.2: Assess data and make predictions


To assess and analyze data, it is useful to
display your data set as a scatter plot.
Then, trace the line of best fit for the data
by inspection (by eye)
24
The goodness of fit of a line


After drawing the line of best fit, it is
necessary to judge its goodness of fit.
A correlation grid is a guide to indicate the
goodness of fit for a line.
25
A correlation grid
Correlation
The goodness of fit
+1
Perfect fit with a line having a positive
slope.
Strong and positive
Most points are closely grouped around a
line having a positive slope.
Weak and positive
The points are spread out but show a
general positive trend.
0
No apparent relationship.
Weak and negative
The points are spread out but show a
general negative trend.
Strong and negative
Most points are closely grouped around a
line having a negative slope.
-1
Perfect fit with a line having a negative
slope.
26
8.3: Display data

Here are the 6 types
of data displays:
(Grade 9)
1.
2.
3.
4.
5.
6.
A scatter plot
A histogram
A circle graph
A stem-and-leaf plot
A box-and-whisker plot
A bar graph
27
A bar graph
 A bar
graph is a diagram that
displays data visually with vertical or
horizontal bars.
 Bar graphs are used to compare
categories.
28
A circle graph
 A circle
graph is a graph in which a
circle representing the whole data is
divided into sections.
 Circle graphs are used to compare
categories to each other and each
category to the whole data set.
29
A stem-and-leaf plot
 A stem-and-leaf
plot is a way of
organizing numerical data by
representing part of each number as
a stem and the other part as a leaf.
 Stem-and-leaf plots organize data
based on place value.
30
A histogram
 A histogram
is a connected bar graph
that shows data organized into
intervals.
 Histograms organize data in
intervals.
31
A box-and-whisker plot
 A box-and-whisker
plot is diagram
that shows the median and range of
a numeric data set.
32
The use of box-and-whisker plots
 A box-and-whisker
plot shows how
data is dispersed or spread around
the median of a data set.
33
The vocabulary of box-and-whisker
plots




The box of the graph contains or
represents at least 50% of the data.
The least and greatest data values are
called the minimum and maximum or the
lower extreme and the upper extreme.
The lower quartile is the median value of
the lower half of the data.
The upper quartile is the median value of
the upper half of the data.
34
How to choose the most appropriate
way to display your data set

The most appropriate choice of data
display depends on the type of data
and the information you wish to
convey.
35
Hints for choosing the best way to
display your data #1
1.
2.
Line graphs and scatter plots can be used to
analyze trends.
Histograms, box-and-whisker plots and
stem-and-leaf plots can be used to analyze
the range of data spread, check where data
is clustered and find the measures of central
tendency.
36
Hints for choosing the best way to
display your data #2
3.
Bar graphs and circle graphs are used to
compare categories.
37
Measures of central tendency
 The
measure of central tendency
is a value that represents the
centre of a set of data.
38
The types of measure of central
tendency
 There
are 3
types of
measure of
central
tendency:
1.
2.
3.
The mean
The median
The mode
39
The mean



The mean is the sum of a set of values
divided by the number of values in the set.
The advantages of the mean: Information is
given about the sum of the values.
The disadvantages of the mean: Influenced
by extreme data values.
40
The median



The median is the middle value when a set
of data is arranged in order from least to
greatest.
Advantage of the median: Not greatly
influenced by extreme data values.
Disadvantages of the median: No
information is given about the sum of the
values.
41
The mode



The mode is the most common value in a
set of data.
Advantage of the mode: Easy to locate in
frequency tables, graphs, bar graphs or
histograms.
Disadvantage of the mode: May change
greatly with new data values.
42