An Introduction to R

Download Report

Transcript An Introduction to R

Introduction to R
A workshop hosted by STAT CLUB
November 19, 2014
Outline
I.
II.
III.
IV.
V.
7/20/2015
About R
Getting started
The very basics
Importing data
Common commands
An Introduction to R
2
I. About R
7/20/2015
An Introduction to R
3
What is R?
• Free, open-source software with its own language
• Works on all operating systems
• Extensive: LOTS of built-in functions and downloadable packages
availble
• Flexible: can define your own functions, modify existing commands,
cutsomize graphics, etc.
• Powerful: can do all sorts of analyses and can handle large data sets
• Integrates in other environments such as Excel, LaTeX, Hadoop, etc.
7/20/2015
An Introduction to R
4
II. Getting started
7/20/2015
An Introduction to R
5
Installing R
• Download from http://lib.stat.cmu.edu/R/CRAN/
• Select the correct platform
• Download the “base” package
• Run the set-up
• It’s quick, easy, and free!
7/20/2015
An Introduction to R
6
7/20/2015
An Introduction to R
7
Interacting with R
• The command prompt ‘>’ indicates that we can begin typing a
command
• Hit Esc key to exit out of a line of code
• Basic rule: type a command and hit ‘enter’ to execute it
• Hit the “Stop” button at top to stop running a line of code
• For example: x = 1:100 creates a vector of values 1,2,…,100
7/20/2015
An Introduction to R
8
R Script/Editor
• A files where you can write, edit, and save codes
• Go to File > New script
• When you have typed in the code you want to run, highlight the
chunk you want to run and either hit ‘ctrl+R’ or right-click and select
“Run line or selection”
• You can save this script for later use by hit ‘ctrl+S’ or going to File >
Save while the Script window is activated
• Will NOT save any results of running the commands – saves the text script
only
7/20/2015
An Introduction to R
9
R Script/Editor
7/20/2015
An Introduction to R
10
Workspaces
• An R workspace includes all the functions and variables (called
“objects”) defined in a session
• The output associated with any command you’ve run will be stored in
the workspace
• Can be saved by going to File > Save Workspace
• Load workspace by going to File > Load Workspace
• If you want to clear some part of the workspace, use rm()
• Use ls() to see what has been saved
7/20/2015
An Introduction to R
11
Working directory
• Where everything will be saved/loaded from by default
• It is, by default, usually in My Documents
• Can change this under File > Change dir
• If you want to save/load from a different place, can usually just type
the file path into the name of the file and R will find it
• Makes it easier to always work from your working directory
7/20/2015
An Introduction to R
12
R packages
• Collections of R functions and datasets created by others
• Many standard packages included, others have to be downloaded
• If you know the name of the package, can install it by going to
Package > Install Packages or by using
install.packages(“packagename”) in command line
• Even if a package is installed on your computer, R will not
automatically use it – so if you need to use a function from a package,
use library(“packagename”) in command line
7/20/2015
An Introduction to R
13
Use as a calculator
• Basic arithmetic can be done intuitively: 12+5, 3/8, 17+(6*5), 5^2, Etc.
• Don’t use brackets! They mean something else! Use parentheses
7/20/2015
An Introduction to R
14
R commands and language
• Mostly in the form of functions: mean(x), plot(x,y), etc.
• CaSe SenSitVe!
• Spaces don’t usually mean anything
• Can use periods ‘.’ and underscores ‘_’ in object names
7/20/2015
An Introduction to R
15
Getting help in R
• Go to Help menu
• If you know the exact command that you need help with, can type ‘?’
before command name in console, and this will bring up an online
documentation for the commands
• If you do not know the exact command but have an idea of what it
might look like or what words may be used in the description, type
‘??’ before the command
• Google!
7/20/2015
An Introduction to R
16
7/20/2015
An Introduction to R
17
III. The very basics
7/20/2015
An Introduction to R
18
Basics to know first
• Creating your own Objects (variables, vectors, matrices, lists,
functions, etc.)
• Assigning names to these objects
• Learning to access objects
• Performing simple calculations and transformations on these objects
7/20/2015
An Introduction to R
19
Types of objects
Functions that you can perform on or with objects depends on their “class” or type:
• Numeric (double-precision numbers)
•
•
•
•
•
•
•
•
•
•
•
7/20/2015
• Double (same as numeric)
• Integer (integer-valued; rarely used)
Character (strings, non-numerical values)
Matrix (matrix of numerical values)
Logical (Boolean – true/false)
Factor (“groups” or levels)
List (list of other types of objects)
Dataframe (table or other collection of data that is numerical or non-numerical)
Functions (functions that take inputs)
To find out which class a variable belongs to, use class()
To determine the dimensions of an object use dim()
Verify a class by using is.numeric(), is.character(), is.logical(), is.data.frame, etc.
Change a class by using as.numeric(), as.character(), as.logical(), as.data.frame, etc.
An Introduction to R
20
Single value
• Use ‘=‘ or ‘<-’ to assign name to a value
• Use quotations if not a numerical value
• Example: x = 36
• Example: y <- “age”
7/20/2015
An Introduction to R
21
Vectors
• Vector: c()
• Use ‘=‘ or ‘<-’ to assign name to vector
• If the vector contains non-numerical values, use quotations
• Example: mileage = c(1200,200,6700,1000,1200)
• Example: type <- c(“Compact”, “Minivan”, “SUV” , “Roadster” ,
“Truck”)
7/20/2015
An Introduction to R
22
Matrix
• Matrix: matrix(data=c(2,3,4,5), nrow=2, ncol=2)
• data = vector of values you want entered in (enters in by COLUMN!)
• nrow = number of rows
• ncol = number of columns
7/20/2015
An Introduction to R
23
Dataframe
• Like a table
• Can contain both numerical and string variables
• Use data.frame(vars)
7/20/2015
An Introduction to R
24
Lists
• Each element in a list can be ANY object – vector, matrix, dataframe,
even another list!
• Use list(vars)
7/20/2015
An Introduction to R
25
Functions
• Creating functions are more complex
• Of the form: g <-function(var1,var2) {var1 + var2}
• g is the function name
• Var1, var2 are the input variables
• The function goes in the curly brackets
• To use the function: g(input1, input2)
7/20/2015
An Introduction to R
26
IV. Importing data
7/20/2015
An Introduction to R
27
Importing data
• Can import from many formats (.txt, .csv, .xls, .xlsx, .sav, .dta, .ssd, …)
• Recommend .txt or .csv – others need packages
• If in working directory:
data1 = read.table(“mydata.txt”, header=TRUE, sep=“,”)
• header = TRUE indicates that a row of column headings/titles are included in the file;
set to FALSE if not
• sep=“,” indicates that a comma is separating records, like in a .csv; can remove this
code if separated by space or tab (default); or can modify if separated by something
else
• If not in working directory, use file path:
data1 = read.table(“C:/Users/xyz/Desktop/folder/mydata.text”,
header=TRUE, sep=“,”)
7/20/2015
An Introduction to R
28
Working with data sets
• Attach datasets to the current space: attach(dataset)
• Use a variable from a dataset: dataset$varname
• Retrieve the names of the variables: names(dataset)
• Take a subset of your data according to some criterion:
subset(dataset,criterion)
7/20/2015
An Introduction to R
29
V. Common commands
7/20/2015
An Introduction to R
30
Arithmetic/calculator
• Add: +
• Subtract: • Multiply: *
• Divide: /
• Raise to a power: ^
• Natural logarithm: log()
• Exponentiation: exp()
• Square root: sqrt()
7/20/2015
An Introduction to R
31
Vector commands
•
•
•
•
•
•
•
•
•
7/20/2015
Create a vector of numbers: c(num1,num2)
Combine vectors together to create one: c(vec1,vec2)
Create a vector of numbers from a to b in increments of 1: a:b
Create a vector of numbers from a to be in increments of d: seq(a,b,d)
Create a vector of numbers from a to b in equal increments such that the
there are k total numbers: seq(a,b,length=k)
Return the number of elements in a vector x: length(x)
Sort entries in a vector x: sort(x, decreasing=FALSE)
Element-wise arithmetic: 3*x, 4+x, log(x), sqrt(x), etc.
Arithmetic of two vectors x and y will be element-wise: x*y, x+y, etc.
An Introduction to R
32
Matrix commands
• Create a matrix: matrix(vals,nrow,ncol)
• Create a diagonal matrix: diag(vals)
• Multiply matrices M1 and M2: M1 %*% M2
• Note that M1*M2 will be element-wise
•
•
•
•
•
•
7/20/2015
Find the determinant of matrix M: det(M)
Find inverse of matrix M: solve(M)
Find transpose of matrix M: t(M)
Combine matrices by column: cbind(M1,M2)
Combine matrices by row: rbind(M1,M2)
Find dimensions of a matrix M: dim(M)
An Introduction to R
33
Retrieving parts of objects
• Return the kth element of a vector x: x[k]
• Return the i,j th element of a matrix x: x[i,j]
• Return the kth object of a list x: x[[k]]
• Return the ith element of the kth object of a list x: x[[k]][i]
• Return the element or object called “name”: x$name
• Can retrieve more than one element at a time
7/20/2015
An Introduction to R
34
Summaries and statistics
• Mean: mean(x)
• Standard deviation: sd(x)
• Median: median(x)
• Minimum: min(x)
• Maximum: max(x)
• Range(min and max): range(x)
• Sum: sum(x)
• Which index contains the minimum value: which.min
• Which index contains the maximum value: which.max
7/20/2015
An Introduction to R
35
Logical
• Operators: > greater than, >= greater than or equal to, < less than, <=
less than or equal to, == equal to, != not equal to, & and, | or
• Just entering some function of operators will return a Boolean(‘TRUE’
or ‘FALSE’) vector
• R will many times treat TRUE as 1 and FALSE as 0 so that you can conduct
mathematical operations on them
• Return indices of a vector that satisfies criterion: which(x > 45)
• To get the actual value: x[x>45]
7/20/2015
An Introduction to R
36
Logical
• If-then statements: if (criterion) {command} else {command}
• For-loops: for (i in x){ commands }
7/20/2015
An Introduction to R
37
Apply functions
• Apply a function to rows or columns of a
matrix:
• apply(M, 1, mean) will take average across rows
• apply(M, 2, sum) will sum columns
• Apply a function to each element of a vector,
list or data.frame: sapply(L, length)
7/20/2015
An Introduction to R
38
Plots
• Scatterplot of a vector x and vector y: plot(x,y)
• Add points to an already-existing scatterplot: points(xvals,yvals)
• Add a line to an already-existing scatterplot: lines(xvals,yvals)
• Histogram of a vector of values x: hist(x)
15
10
0
5
Frequency
20
Histogram of err
-3
7/20/2015
An Introduction to R
-2
-1
0
1
2
err
39
Tables
• Create a table of frequencies from a vector of values x: table(x)
• Create a two-way table between vectors x and y of same length:
table(x,y)
7/20/2015
An Introduction to R
40
Linear regression
• Linear regression of
y on x: lm(y~x)
• Can get more info
using
summary(model)
7/20/2015
An Introduction to R
41
Working with datasets: linear regression
• Can find all objects in the model: names(model)
7/20/2015
An Introduction to R
42
Hypothesis tests
• One-sample t-test for vector of values x: t.test(x,
alternative=“two.sided”,mu=0)
• Two-sample t-test between vectors x and y: t.test(x,y)
• Chi-squre test of independence in two-way table “tab”: chisq.test(tab)
7/20/2015
An Introduction to R
43
Final warnings!
• Floating point arithmetic is not exact!
• Missing values are not excluded by default – must use
na.rm = TRUE option
• Combining different classes will all entries to be the same
class
• Some things, such as quotation marks, cannot not be
easily copied and pasted into R from other applications
such as Word
7/20/2015
An Introduction to R
44