How (not) to write up quantitative results

Download Report

Transcript How (not) to write up quantitative results

Results

... of quantitative research

... A work in progress

Names of chapters after Method

• What key words might we find?

– – Data Analysis Findings – Results – Interpretation – Discussion – Implications – Conclusion

Results and Discussion separate

• Which would go in which?

1) 2) 3) 4) Graphs derived from the data Tables of means and percentages Significance tests Talk about what the graphs and other stats show 5) 6) 7) 8) Talk about whether the stats show what was expected Comparison /synthesis of what one instrument showed with what another showed about the same thing Talk about whether what is shown fits teacher experience References to the RQs / RHs 9) References to what other studies found (covered in lit review) 10) Talk about how the qualitative data support/complement the quantitative 11) Talk about implications for theory

Organisation of Results

• Whether or not Results are first given ‘bare’... how to organise?

A) Instrum1…Instrum2…Instrum3 B) Descriptive stats….Inferential stats C) RQ1….RQ2….RQ3…. or RH1...RH2...RH3...

D) Interesting theme1...theme2...

Refer back to the research Qs and hypotheses

• • – Not… The first research hypothesis assumes the teachers in a private school use more English....

– But… I expected that private school teachers would explain meaning in English more than state school teachers (2.3.2). Accordingly my first hypothesis (2.5.1) was that there would be a difference between state school and private school in the extent to which teachers explain the meaning of new words in English....

Refer back to the variables

• • • • Not… – The two explanatory variables are the state school and the private school But… – One of our explanatory variables is school (state and private) Not….

– This essay presents the results of correlating the two types of school with the participants’ reading scores But….

– We first present the result for the relationship of school with reading score

Results: Three things to report... And interpret

Descriptive stats

Graphs and tables

Inferential stats

How to persuade with percentages

• Which sounds more impressive, A or B? A) 2 out of 4 subjects agreed B) 50% of subjects agreed A) 80 out of 160 subjects agreed B) 50% of subjects agreed • OK, but which result would you actually trust more? How should one report such results?

How to perplex with percentages: What’s it out of?

What is unclear? How to restate this better?

In our survey we polled 50 people, though 10 declined to participate. …. 60% said yes to the question ‘Do you like the English class?’…

How to perplex with percentages: Report percentages or mean ratings?

A five point Likert scale for agreement has been used. Any problem?

52% of the student sample strongly agrees and 43% of them agree that it is important for them to have the least number of errors in writing, which leaves 2% only who disagree to this.

While 33% of the sample disagrees and 13 % strongly disagree that correcting and discussing their writing errors in front of the class is embarrassing for them, however, 25% agree and 22% strongly agree. Adding up the percentages of the agreements and disagreements together, it turns out there is a close outcome on this point, almost a tie.

On our 1-5 agreement scale the mean rating was 3.1, close to the midpoint

How to perplex with percentages: Slippage of interpretation

Teachers were asked (item 10) whether they agree that the students achieve the minimum level of the objectives of the program. From figure 4.5 we can see that there is no noticeable difference between the % of teachers who 'strongly agree', 'agree', or 'disagree' (around 31% each). ........30% of teachers report that their students do not achieve the minimum level of the program objectives. Thus, 70% of the pupils are considered to have achieved most of the objectives regardless of their level of proficiency, and only a third of the pupils fail to achieve most of the objectives.

How to confuse with counts: Percentage scores versus group/aggregate percent

• Two ways of handling data arising from different numbers of potential occurrences for different people. Imaginary example of data where three subjects have been recorded in quasi-natural conversation, and counts have been made of their NS-like/correct use of third person –s. • Why do the percent differ in A and B? Which would the statistician prefer and why?

A) Analysis with subjects as cases: percentage scores and their mean

Case Correct Incorrect Total Percent correct Mean % correct

Lnr1 Lnr2 12 8 12 12 24 20 50 40 Lnr3 3 9 12 25 Total 23 33 56 38.3%

B) Analysis with occurrences as cases: group percent

Correct Incorrect Total

Group frequency

23 33 56

Percent

41.1% 58.9%

How to muddle with means: Appropriate scales?

• Reporting background information about teachers of young learners. What has gone wrong?

AGE

Mean

2.0727

SD N

.83565 55 EDU MAJOR 1.4182

.6727

.49781 55 .47354 55 Years of TEYL Years of teaching in Primary school Difficulty of teaching mixed ability class 2.4182

2.2000

3.1273

55 .70448 55 .88306 55

• The questionnaire items were of the following types: - What is your age?

21-25 26-30 31-35 - What is your educational level?

BA MA PhD - How many years have you been teaching in Primary school?

Less than 2 2-5 6-10

How to muddle with means: Slippage of interpretation

Students produced each answer based on a Likert scale ranging from 1 (strongly disagree) to 5 (strongly agree). See table X.

Table X – Students’ answers to writing quality items

Item Number

2 6 7

Item wording Average number of students

I don’t pay more attention to spelling when I use the computer instead of writing by hand 1.9

The computer cannot help me write my papers better I am more careful about punctuation when writing by hand than with the computer 2.9

2.5

Etc.

How to muddle with means: What does mean mean?

• Do these make sense?

– We can see from the mean in Table 5 that all our participants scored above average on the reading test

Test Min Max Mean SD

Reading /60 Writing /50 33 22 57 43 42.8

30.7

8.35

5.46

- We can see from the mean in Fig. 2 (4.72) that most of our participants scored above average

Joke from WWW: Most of us have A Greater Than Average Number of Legs

The great majority of people have more than the average number of legs. Amongst the 57 million people in Britain there are probably 5,000 people who have only one leg. Therefore the average number of legs is ((5000 x 1) + (56,995,000 x 2)) / 57,000,000 = 1.9999123.

Since most people have two legs... need I say more?

- We can see from the mean in table 3 that the most popular score was 21 30 20 10 0 10 - 1 5 15 - 2 0 20 - 2 5 25 - 3 0 30 - 3 5 35 - 4 0 40 - 4 5 45 - 5 0 Std. Dev = 5.08 Mean = 21 N = 7 2.00

PARENTAL ENCOURAGEMENT

• •

Talk about means… but not like this

Not...

– This table shows the two figures collected from the thirteen participants.

But...

– Table 4.2 shows the means and SDs from our sample • • Not... – Custom tables present the averages (means) of the two languages But...

– Table 4.2 presents the means for the two languages • • Not...

– To support what I have found in the error bars , now I will check the information provided in Table 4. As is easily seen, the averages of the three groups... are similar to the means in the error bars. But...

– From the means (Fig 3 and Table 4) we can see....

• • Not...

– The difference between mean scores is 17,9250% But...

– The difference between means scores is 17.93% • • Not...

– The difference between the means is instinctively large (0.4907 and 2.6111). But...

– The difference between the means is intuitively large at 2.12, given the length of the scale (5 points)

• • Not...

– Moreover, the averages/means of the state school is .46, and the private school is .54 which shows that the results are not as significant as we expect...

But...

– The means of the schools are similar (state school M=0.46, private school M = 0.54). This is not a pedagogically substantial difference and furthermore it is not significant (t=1.34 , p=0.563): hence our hypothesis is not supported

• •

Talk about standard deviations… but..

1) 2) Figure 2 shows the total number of respondents, their means and SDs Ok as far as it goes, but what does the SD really tell us? Is any of these near the truth?

Both of the standard deviations for the two variables seem fairly large…. This would seem to infer that ..... there seems to be distribution of the two variables throughout the speech community.

The SD of present tense is larger than that of past tense…., so the population of present tense is more spread.

3) 4) 5) The standard deviation between the two schools is not significant All SDs are large which suggests that the students generally agree on the answers they gave...

The SDs of the two schools are .59 and .64, which are close to each other....and it denotes that students agree with each other on the matter and that there is no clear difference between the two groups.

How to cheat with charts

• http://privatewww.essex.ac.uk/~scholp/onevardesc.htm

Talk about graphs... but..

• • • • – Not...

For rating scale scores, we need to use a histogram instead of a bar chart because we have two EVs to compare with and the scale of scores need to be shown. Besides, we need a graph for each school separately. So we can go through the routine to make a graph. First of all, go to the GraphHistogram, and choose the “teacher explaining words in English” column and click it to the Variable item. Then put “school” in Column and press OK the graph will come out.

– But...

– Not...

As the above graph indicates....in general there were rather more 5s than 4s – But...

As we see in fig 4.6, the number of students who rated English at school highly useful (5) far exceeded those who gave it lower ratings.

• What are the key terms to avoid these clumsy statements describing graphs?

1) From the graphs we can see that…The scores are pushed in different directions against one side in each graph 2) There are about 40 and 45 students stand on the first two scale, respectively... heaped to the left/at the end of the scale 3) Figure 2 provides us with the responses which are inclined in different positions http://privatewww.essex.ac.uk/~scholp/distrib.htm

• Premature interpretation of graphs – According to the histogram, it represents most state school teachers do not use English to explain words as much as private school teachers do. So all of a sudden, we may feel that the research hypothesis is correct. – Because of the large similarity between the two histograms we can assume that the hypothesis does not hold water.

– Both from graph 11 and 12, we see there is a significant difference between the averages.

Talk about correlation and the associated scatterplot

• • • • Not...

– The two variables are not comparable, so a Pearson correlational design was chosen But...

– Since we are interested in the relationship between two continuous variables, the Pearson correlation was chosen Not...

– To investigate the correlation, the Pearson r correlation coefficient is used as a method.

But...

– In order to quantify this relationship, the Pearson r correlation was calculated

• • • • • • Not...

– The correlations suggest that writers' linguistic knowledge is more likely to have an influence on their use of affective behaviors than their writing performance.

But...

– The correlations suggest that there is a stronger relationship between writers’ linguistic knowledge and their affective behavior than their writing performance Not...

– Table 3: Mann-Whitney significance test of correlation between school type and teacher's word explanation in Greek and English.

But...

– Table 3: Mann-Whitney significance test of the relationship between school type and teacher's word explanation in Greek and English.

Not...

– As you can see in the graph, there are some squares particularly off-line…in fact, these two subjects were the ones plotted far off the trendy line.

But...

– As seen in Figure 4.3, there are two outliers which are markedly distant from the overall linear trend

Talk about normality... wrong

1) The bars and the mode are skewed to the end in both histograms and the population is concentrated around the median.... so we do not see a nice bell-shaped heap 2) From the “bell-shaped heap”, we suddenly find out the heap of the scale in both schools are the same, both on 0.5 which tells us that the frequency the teachers explain words in English are the similar.

3) sd in the state is .59 and sd in the private is .64. Again this proofs that the variables are not normally distributed in the two schools.

4) As the distribution is so non-normal for both variables, it is clear that these results would have a significant effect upon the overall picture.

5) The test distribution seems normal based on the figures . That is why you use the Wilcoxon test not the t test.

Talk about normality... right

http://privatewww.essex.ac.uk/~scholp/distrib.htm

• The distribution of our sample data is skewed (fig 5), indicating that the population sampled is unlikely to be normally distributed • The K-S test was non-significant, indicating that the distribution of our data is not consistent with the normal distribution • Since our data is not distributed normally, we do not use parametric significance tests / t tests / ANOVA but instead use the Mann-Whitney tests / etc. / the ordinal option of the Generalized Linear Model

Talk about the need for inferential stats...

• Not good thinking 1) To ensure the above results to show the right things, I need to do the inferential test of the difference... to look for what the p or sig value is...

2) A Wilcoxon test was used to provide more details about our results. This kind of test can provide more explanation of the use of language 3) …to see if this relationship is one which bears some significance and goes some way to supporting the hypothesis, some descriptive statistics are needed.

• On the right lines but badly worded 4) Hence this needs to be tested inferentially... guiding us to the way we get our results from some population. To assure the given results,... a T-test was taken.

5) In addition a Mann-Whitney test was taken to ensure that there is a sampling error that causes this slight difference.

• Better 6) The Mann Whitney test will indicate whether the null hypothesis might be supported, i.e. the difference between the means might be so small that it could easily arise by chance due to sampling error. Alternatively RH1 might be supported.

7) In order to check if our results can be generalised to the wider population which was sampled, we use inferential statistics.

Talk about p values and significance

• • – Not...

It can therefore be said that the differences between the different types of use are significant, as p<0.000

– But...

The difference between the two kinds of use is therefore significant (p<.001).

• • – Not...

The P value, in Fig. 4, it calls sig. value, is .000. But actually it is not equal to zero, as here it only shows 3 decimal places, it should be something a little bit more than zero.

– But...

From table 4 we see that p<.001

• • – Not...

In fig. 4, the sig. value (p value) is less than 0.5 for each gender.

– But...

The difference between genders is significant (p<.05) • • – Not...

The sig value is (.001), and therefore, the difference is significant, since the sig value is substantially smaller than the conventional significance level of .005.

– But...

The difference is significant at the alpha level of .05 (p=.001)

• •

Not...

– The p value seems to be significant enough to reject the null hypothesis. LED makes a significant difference with this type of information

But…

– There is a significant difference between LEDs in this type of information . Hence we reject the null hypothesis.

• • – Not...

The value of the significance test result is .170 and cannot considered to be significant. – But...

The result was non-significant (p=0.170) • • – Not...

This is well above the level of .05 and so the difference here is insignificant.

– But...

This is above our alpha threshold of .05 and so is nonsignificant.

Talking about various inferential stats: Interpretation of significance

• • Not...

– The Pearson correlation coefficient shows no significant difference.

But...

– The Pearson correlation showed no significant relationship • • Not...

– In the Wilcoxon test, the figure of z is -.8928. And the sig <.001 is smaller than .05 so we can infer the samples are chosen randomly from the population But...

– There was a significant difference between the two conditions (Wilcoxon z = -.893, p<.001)

• • Not...

– I applied the Wilcoxon test to make sure that the samples are normally distributed from populations or not.

But...

– The Wilcoxon test was used to test the significance of the difference between the two conditions / occasions • • Not...

– t test show that the samples can't represent a real population.

But...

– The t test shows a significant difference between the two groups

• • – Not...

Spelling was significant, but content was not. – But...

Pre-post improvement was significantly different between groups for spelling but not for content.

Talk about implications of results for hypotheses and research questions

• – Which is better?

This proves the hypothesis.

– This reinforces my hypothesis. – Our hypothesis is therefore supported. • – Which is better?

The null hypothesis will be refused – It is assumed that the null hypothesis, according to which there is no difference…. will be rejected.

– The null hypothesis is not supported – Therefore... I infer that there will be a null hypothesis to the RQ.