Introduction to Statistics

Download Report

Transcript Introduction to Statistics

Using R

Harry R. Erwin, PhD School of Computing and Technology University of Sunderland

Resources

• Crawley, MJ (2005)

Statistics: An Introduction Using R.

Wiley.

• Gonick, L., and Woollcott Smith (1993)

A Cartoon Guide to Statistics.

HarperResource (for fun).

Lecture Outline

• R as a statistical calculator • Creating data • Graphing and plotting • Statistical distributions • Dataframes • Summarising data

Using R

• We will work through a few examples of statistical calculations and creating data.

• y<-c(3,7,9,11) • z<-scan() • a<-1:6 • b<-seq(0.5,0.0,-0.1) • rep(value,count) creates a vector with value count times.

• gl(upTo,repeats) can be used to generate factor data

Graphics

• Examples of plot() • ?par for help on graphics parameters

Working with Dataframes

• R works with data in dataframes, objects with rows and columns.

• Each row is an observation or a measurement • Each column contain the values of a variable.

• Variable types include numbers, text (factors), dates, or logical variables.

• Columns have names. Rows have row.names.

Reading a Dataframe

• worms<-read.table("worms.txt", header = T, row.names = 1) • attach(worms) • names(worms) • If the row.names or the names are bad, you can set them to values.

• worms • summary(worms)

Selecting Rows or Columns

• worms[,1:3] for all the rows and columns 1-3.

• worms[5:15,] for the middle rows • worms[Area>3 & Slope < 3,] for logical tests • To sort a dataframe, you have to designate the columns to be sorted and the column to base the sort on: worms[order(worms[,1]),1:6] • Example of a reverse sort

• z<-13:1 • y[3] • y[3:7] • y[c(3,5,6,9)] • y[-1] • y[-length(y)] • y[y>6] • z[y>6] • y[y%%3!=0]

Vectors

• y<-c(5,7,7,8,2,5,6,6,7,5,8,3,4) • Try mean, var, range, max, min, summary, IQR, fivenum

Vector Operations

• * is vector multiplication – If they are not the same length, the shorter vector is repeated as needed.

• To join vectors, use the c() function • ?c

• Subscripting can be based on a number, vector, or test.

• To drop an element, subscript with a minus sign in front • Vectors can be combined with cbind() and rbind()

Arrays, etc.

• Like vectors or dataframes with multiple dimensions • Lists can be used to combine data of different types.

• val <- list(varname=value,…) • Although vectors are subscripted using [], lists are subscripted with [[]] • Factors are special – citizen <- factor(c("US","US","UK”)) • Examples from book.

Sorting and Ordering

• Never sort a dataframe column on its own. The other columns are not sorted.

• So don’t use sort() • Instead use order(), since it leaves the dataframe unmodified. It returns a vector of subscripts, not values, but then you can apply the dataframe to the reordered vector to show it in the new order.

Table

• Suppose vals is a collection of vectors • table(vals) reports the count of each unique value • tapply takes three arguments – Variable or dataframe to be summarised – Variable by which the summary is classified – Function to apply • Examples

Data Manipulation

• To convert a continuous variable into a categorical variable, use cut(vals,levels) • You can also specify the break points • split() can be used to generate a list of vectors on the basis of the levels of a factor.

• Example

Saving your Work

• history(Inf) • savehistory("filename") • save(list=ls(), file = "filename") • Tidying up – rm(var) any temporary variables – detach(dataframes) • rm(list=ls()) will clean up everything

Conclusions

• There are other tools and languages – Minitab – SAS – Spreadsheets • Use what you’re comfortable with.

• But professional statisticians use R.