Transcript Document
Here are some additional methods for describing data
1
0 10 100 Here I have the number line. We start on the left at zero and as we move to the right we increase in value. (We can move to the left of zero into the negative numbers.) Now, if we want a number line vertically we tip it from the 100 side up, as my arrow shows. This is the normal convention in all of math. One EXCEPTion is for the stem and-leaf plot or display. In this one area we tip the line up from the zero side.
As an example of some data say 50 people take a test that has up to 150 points. Possible values on the test are 70, 120, 132, 79 and so on.
2
With the 50 people who took the test, let’s imagine we sort the data from low score to high score. In other words, we might arrange the data so that the lowest score is first, and then the next highest and so on. The last score is the highest.
When we have a 3-digit number, like with test score, we have in general xxy. I put xxy because we will call the xx part the stem and y the leaf. The number 69, for example would be stem 06, or 6, and leaf 9. 123 would be steam 12 and leaf 3.
Say we had two people on the test with score 69 and 68. In sorted order we would have 68, 69. In stem-and-leaf form we would have stem 6 and leafs 8 and 9. On the next slide I will have the beginning of a stem-and-leaf display similar to what you might see in the book. 3
10 11 12 13 6 7 8 9 8 9 1 2 2 2 4 5 5 6 7 8 8 8 Here I have stem and leaf for the scores in the 60’s and the 90’s. Not the frequency of scores in the 60’s was two (two leafs) and the frequency of scores in the 90’s was 12.
The stem-and-leaf is like a histogram, but the leafs of each number are used in a stack to the side of the stem. Longer stacks represent more frequent groups. 4
Crosstabulations, or crosstabs for short.
Sometimes in statistics we will want to work with two variables at a time. The reason is that we think the two variables are somehow related. As an example, what do you think is the relationship between student grades and the number of classes they skip in a semester? My hunch would be the more classes skipped the lower the grade. But we may want to research this idea. Crosstabs is a tabular summary of two variables. On the next screen I create a basic crosstabs based on students midterm scores in the rows and the students final exam scores in the columns. 30 students were observed. 5
Final exam <60 60 to <70 70 to <80 80 to <90 90 + total Midterm < 60 2 2 0 0 0 4 60 to < 70 0 3 3 0 0 6 70 to < 80 0 0 3 3 0 6 80 to < 90 0 0 0 2 2 4 90 + 0 0 0 0 10 10 Total 2 5 6 5 12 30 Here I have a frequency crosstab. Note that 2 students had midterm scores < 60 and final exam scores < 60, for example.
The “total” row is really the frequency distribution for the variable Final exam score, and the “total” column is the frequency distribution for the variable midterm score.
In this table, we could put percentages in several formats. 6
Percentages in the cells We call each part of the table a cell and depending on what we want to think about we can calculate different percentages.
Column percentages When we look at a given column we have a final exam score and each row is a midterm score. If we use a column total as the basis of a percentage then we see percents of midterm score at each final exam score.
Row percentages When we look at a given row we have a midterm score and each column is a final exam score. If we use the row total as the basis for a percentage then we see the percents of final exam score for a given midterm score.
7
Overall percentages Since we had 30 in the data, each cell divided by 30 tells us the percent of times that cell came up.
Scatter Diagram Say we talk to people and we ask them their years of schooling and income last year. For each person we would have two values. On the next screen I show a scatter plot, where each person is a dot in the graph.
8
y - income
Scatterplot
note here in the
scatterplot
that in general the higher the schooling, the higher the income. Thus, knowing schooling will permit better prediction of income.
x - years of schooling In a graph we put the variable of interest on the y axis. Here it is thought that knowing the years of schooling for a person will better help us understand income and schooling is put on the x axis. In other words, certain values of income are ‘matched’ with schooling amounts. 9
In a scatterplot, when the dots seem to be going up hill we say there is a positive, or direct, relationship between the variables. This means that the higher the value on one variable, the higher the value of the other variable.
Dots going down hill suggest a negative, or indirect, relationship. This means as the value of one variable is getter higher the value on the other is getting lower.
y y y Positive relationship x Negative relationship x x No relationship?
10