Transcript Slide 1
R graphics
R has several graphics packages
The plotting functions are quick and easy to use
We will cover:
Bar charts – frequency, proportion
Pie charts
Histograms
Box plots
Scatter plots
Explore further on your own - R help, demo(graphics)
Bar charts
A bar chart draws a bar with a height proportional to the
count in the table
The height could be given by the frequency, or the
proportion, where the graph will look the same, but the
scales may be different
Use scan() to read in the data from a file or by typing
Try ?scan for more information
Usage is simple: type in the data. It stops adding data
when you enter a blank row
Bar charts
Example:
Suppose, a group of 25 animals are surveyed for their
feeding preference. The categories are (1) grass, (2)
shrubs, (3) trees and (4) fruit. The raw data is
3411343313212123231111431
Let's make a barplot of both frequencies and
proportions…
Bar chart - frequency
Example: Feeding preference
> feed = scan()
1: 3 4 1 1 3 4 3 3 1 3 2 1 2 1 2 3 2 3 1 1 1 1 4 3 1
26:
10
2
0
Note: barplot(feed) is not correct.
Use table command to create
summarized data, and the result of this
is sent to barplot creating the barplot of
frequencies
4
6
> barplot(table(feed))
8
Read 25 items
Frequency
1
2
3
4
Bar chart - proportion
Example cont…
# divide by n for proportion
0.2
0.3
0.4
> barplot(table(feed)/length(feed))
> table(feed)/length(feed)
2
3
4
0.40 0.16 0.32 0.12
0.0
1
0.1
feed
1
2
3
4
Pie charts
The same data can be studied with pie charts, using the
pie function
Following are some simple examples illustrating usage similar to barplot(), but with some added features
We use names to specify names to the categories
We add colour to the pie chart by setting the pie chart
attribute col
The help command (?pie) gives some examples for
automatically getting different colours
Pie charts
> feed.counts = table(feed)
# store the table result
> pie(feed.counts)
# first pie -- kind of dull
> names(feed.counts) = c(“grass",“shrubs", “trees",“fruit")
# give names
> pie(feed.counts)
# prints out names
> pie(feed.counts,col=c("purple","green2","cyan","white"))
# with colour
1
grass
grass
2
shrubs
shrubs
4
3
Boring pie
fruit
fruit
trees
Named pie
trees
Coloured pie
Histograms
Histograms are similar to the bar chart, but the bars are
touching
The height can be the frequencies, or the proportions
In the latter case, the areas sum to 1 -- a property you
should be familiar with, since you’ve already studied
probability distributions
In either case the area is proportional to probability
Histograms
To draw a histogram, the hist() function is used
A nice addition to the histogram is to plot the points using
the rug command
As you will see in the next example, it is used to give the
tick marks just above the x-axis. If the data is discrete and
has ties, then the rug(jitter(x)) command will give a little
jitter to the x values to eliminate ties
Histograms
Example:
Suppose a lecturer recorded the number of hours that 15
students spent studying for their exams during one week
29.6 28.2 19.6 13.7 13.0 7.8 3.4 2.0 1.9 1.0 0.7 0.4 0.4 0.3 0.3
Enter the data:
> a=scan()
1: 29.6 28.2 19.6 13.7 13.0 7.8 3.4 2.0 1.9 1.0 0.7 0.4 0.4 0.3 0.3
16:
Read 15 items
Histograms
Draw a histogram:
> hist(a)
# frequencies
> hist(a,probability=TRUE)
# proportions (or probabilities)
> rug(jitter(a))
# add tick marks
NULL
Histogram of a
0.12
Histogram of a
0.10
0.08
0.06
Density
0.04
4
preferred histogram of
proportions (total area = 1)
0.00
0.02
2
0
Frequency
6
8
histogram of
frequencies (default)
0
5
10
15
20
25
0
30
5
10
15
a
a
Note different y-axis
20
25
30
Histograms
The basic histogram has a predefined set of break points
for the bins
You can, however, specify the number of breaks or break
Histogram of a
Try it….
6
4
2
hist(a,breaks=3) or hist(a,3)
0
Use:
Frequency
8
10
points
0
5
10
15
a
20
25
30
Boxplots
The boxplot is used to summarize data succinctly, quickly
displaying whether the data is symmetric or has
suspected outliers
Typical boxplot:
Median
Whiskers
Lower extreme
Lower
hinge/quartile
Upper extreme
Upper
hinge/quartile
Boxplots
To showcase possible outliers, a convention is adopted to
shorten the whiskers to a length of 1.5 times the box
length - any points beyond that, are plotted with points
Min
Outliers
Max
Thus, the boxplots allows us to check quickly for
symmetry (the shape looks unbalanced) and outliers (lots
of data points beyond the whiskers)
In the example we see a skewed distribution with a long
tail
Boxplots
To draw boxplots, the boxplot function is used
As sample data, let’s get R to produces random numbers
with a normal distribution:
> z = rnorm(100)
>z
# generate random numbers
# list numbers in z
Because the generated numbers are produced at random,
each time you execute this command, different numbers
will be produced
Boxplots
Now you draw a boxplot of the dataset (z, in this case)….
Use the boxplot command, in conjunction with various
arguments
You must indicate the dataset name, but then you can
also label the plot and orientate the plot
A notch function is useful to put a notch on the boxplot, at
the median
> boxplot(z,main="Horizonal z boxplot",horizontal=TRUE)
> boxplot(z,main="Vertical z boxplot",vertical=TRUE)
> boxplot(z,notch=T)
What do you get, when you try it?
Boxplots
A side-by-side boxplot to compare two treatments
11 8 4 5 9 5 10 5 4 10
> y = c(11, 8, 4, 5, 9, 5, 10, 5, 4, 10)
6
> boxplot(x,y)
4
> x = c(5, 5, 5, 13, 7, 11, 11, 9, 8, 9)
10
control:
8
experimental: 5 5 5 13 7 11 11 9 8 9
12
Data:
1
2
Plotting
The functions plot(), points(), lines(), text(), mtext(), axis(),
identify(), legend() etc. form a suite that plots points, lines,
and text, gives fine control over axis ticks and labels, and
adds a legend as specified
Change the default parameter settings
- permanently using the par() function
- only for the duration of the function call e.g.,
> plot(x, y, pch="+")
# produces scatterplot using a + sign
Time restriction - but you should be aware of the power of
R, and explore these options further
Scatter plots
The plot function will draw a scatter plot
Additional descriptions of the plot can be included
Using the data from the previous example, draw some
scatter plots….
> plot(x)
> plot(x,y)
> plot(y,x)
# change axis
> plot(x,pch=c(2,4))
# print character
> plot(x,col=c(2,4))
# adds colour
Linear regression
Linear regression is the name of a procedure that fits a
straight line to the data
Remember the equation of the line: y = b0 + b1x
The abline(lm(y ~ x)) function will plot the points, find the
values of b0, b1, and add a line to the graph
The lm function is that for a linear model
The funny syntax y ~ x tells R to model the y variable as
a linear function of x