Transcript Introduction to Statistics
Using R
Harry R. Erwin, PhD School of Computing and Technology University of Sunderland
Resources
• Crawley, MJ (2005)
Statistics: An Introduction Using R.
Wiley.
• Gonick, L., and Woollcott Smith (1993)
A Cartoon Guide to Statistics.
HarperResource (for fun).
Lecture Outline
• R as a statistical calculator • Creating data • Graphing and plotting • Statistical distributions • Dataframes • Summarising data
Using R
• We will work through a few examples of statistical calculations and creating data.
• y<-c(3,7,9,11) • z<-scan() • a<-1:6 • b<-seq(0.5,0.0,-0.1) • rep(value,count) creates a vector with value count times.
• gl(upTo,repeats) can be used to generate factor data
Graphics
• Examples of plot() • ?par for help on graphics parameters
Working with Dataframes
• R works with data in dataframes, objects with rows and columns.
• Each row is an observation or a measurement • Each column contain the values of a variable.
• Variable types include numbers, text (factors), dates, or logical variables.
• Columns have names. Rows have row.names.
Reading a Dataframe
• worms<-read.table("worms.txt", header = T, row.names = 1) • attach(worms) • names(worms) • If the row.names or the names are bad, you can set them to values.
• worms • summary(worms)
Selecting Rows or Columns
• worms[,1:3] for all the rows and columns 1-3.
• worms[5:15,] for the middle rows • worms[Area>3 & Slope < 3,] for logical tests • To sort a dataframe, you have to designate the columns to be sorted and the column to base the sort on: worms[order(worms[,1]),1:6] • Example of a reverse sort
• z<-13:1 • y[3] • y[3:7] • y[c(3,5,6,9)] • y[-1] • y[-length(y)] • y[y>6] • z[y>6] • y[y%%3!=0]
Vectors
• y<-c(5,7,7,8,2,5,6,6,7,5,8,3,4) • Try mean, var, range, max, min, summary, IQR, fivenum
Vector Operations
• * is vector multiplication – If they are not the same length, the shorter vector is repeated as needed.
• To join vectors, use the c() function • ?c
• Subscripting can be based on a number, vector, or test.
• To drop an element, subscript with a minus sign in front • Vectors can be combined with cbind() and rbind()
Arrays, etc.
• Like vectors or dataframes with multiple dimensions • Lists can be used to combine data of different types.
• val <- list(varname=value,…) • Although vectors are subscripted using [], lists are subscripted with [[]] • Factors are special – citizen <- factor(c("US","US","UK”)) • Examples from book.
Sorting and Ordering
• Never sort a dataframe column on its own. The other columns are not sorted.
• So don’t use sort() • Instead use order(), since it leaves the dataframe unmodified. It returns a vector of subscripts, not values, but then you can apply the dataframe to the reordered vector to show it in the new order.
Table
• Suppose vals is a collection of vectors • table(vals) reports the count of each unique value • tapply takes three arguments – Variable or dataframe to be summarised – Variable by which the summary is classified – Function to apply • Examples
Data Manipulation
• To convert a continuous variable into a categorical variable, use cut(vals,levels) • You can also specify the break points • split() can be used to generate a list of vectors on the basis of the levels of a factor.
• Example
Saving your Work
• history(Inf) • savehistory("filename") • save(list=ls(), file = "filename") • Tidying up – rm(var) any temporary variables – detach(dataframes) • rm(list=ls()) will clean up everything
Conclusions
• There are other tools and languages – Minitab – SAS – Spreadsheets • Use what you’re comfortable with.
• But professional statisticians use R.