MATLAB/R Dictionary R meetup NYC January 7, 2010 Harlan Harris [email protected] @HarlanH Marck Vaisman [email protected] @wahalulu MATLAB and the MATLAB logo are registered trademarks of The Mathworks.
Download ReportTranscript MATLAB/R Dictionary R meetup NYC January 7, 2010 Harlan Harris [email protected] @HarlanH Marck Vaisman [email protected] @wahalulu MATLAB and the MATLAB logo are registered trademarks of The Mathworks.
MATLAB/R Dictionary R meetup NYC January 7, 2010 Harlan Harris [email protected] @HarlanH Marck Vaisman [email protected] @wahalulu MATLAB and the MATLAB logo are registered trademarks of The Mathworks. About MATLAB What is MATLAB MATLAB History • Commercial numerical programming language, simulation and visualization • One million users (engineers, scientists, academics) • MATrix LABoratory – specializes in matrix operations • Mathworks - base & add-ons • Open-source Octave project • Developed by Cleve Moler (Math/CS Prof at UNM) in the 1970’s as a higher-level numerical programming language (vs. Fortran LINPACK) • Adopted by engineers for signal processing, control modeling • Multipurpose programming language Notes • Today’s focus: Compare MATLAB & R for data analysis, contrast as programming languages • MATLAB is Base plus many toolboxes – Base includes: descriptive stats, covariance and correlation, linear and nonlinear regression – Statistics toolbox adds: dataset and category (like data.frames and factors) arrays, more visualizations, distributions, ANOVA, multivariate regression, hypothesis tests -> • Interactive programming: Scripts and Read-EvaluatePrint Loop • Similar representations of data – Both use vectors/arrays as the primary data structures • Matlab is based on 2-D matricies; R is based on 1-D vectors – Both prefer vectorized functions to for loops – Variables are declared dynamically • Can do most MATLAB functionality in R; can do most R functionality in MATLAB. The basics: vectors, matrices and indexing Task Create a row vector v = [1 2 3 4] v<-c(1,2,3,4) Create a column vector v=[1;2;3;4] or v=[1 2 3 4]’ v<-c(1,2,3,4) Note: R does not distinguish between row and column vectors Enter a matrix A A=[1 2 3; 4 5 6] Enter values by row: A<-matrix(c(1,2,3,4,5,6), nrow=2, byrow=TRUE) Enter values by column: A<-matrix(c(1,4,2,5,3,6), nrow=2) Access third element of vector v v(3) v[3] or v[[3]] Access element of matrix A A(2,3) A[2,3] “Glue” two matrices a1 and a2, same number of rows, side by side A=[a1 a2] A<-cbind(a1,a2) “Stack” two matrices a1 and a2, same number of columns A=[a1;a2] A<-rbind(a1,a2) Reshape* matrix A, making it an m x n matrix with elements taken columnwise from A A=reshape(A,m,n) dim(A)<-c(m,n) Operators Task Assignment = <- or = Whole Matrix Operations: Multiplication: A*B Square the matrix: A^2 Raise to power k: A^k A %*% B A %*% A A %*% A %*% A … Element-by-element Operations: A.*B A./B A.^k A*B A/B A^k Compute A-1B A\B A%*% solve(B) Sums Columns of matrix: sum(A) Rows of matrix: sum(A,2) colSums(A) rowSums(A) Logical operators (element-byelement on vectors/matrices) a < b, a > b, a <= b, a >= b a == b a ~= b AND: a && b a < b, a > b, a <= b, a >= b a == b a != b AND: a && b (short-circuit) a & b (element-wise) OR: a || b a | b XOR: xor(a,b) NOT: !a OR: a || b XOR: xor(a,b) NOT: ~a Working with data structures Task Build a structure v of length n, capable of containing different data types in different elements. MATLAB: cell array R: list v=cell(1,n) In general, cell(m,n) makes an m × n cell array. Then you can do e.g.: v{1}=12 v{2}=’hi there’ v{3}=rand(3) v<-vector(’list’,n) Then you can do e.g.: v[[1]]<-12 v[[2]]<-’hi there’ v[[3]]<-matrix(runif(9),3) Create a matrix-like object with different named columns. MATLAB: struct array R: data.frame avals=2*ones(1,6); yvals=6:-1:1; v=[1 5 3 2 3 7]; d=struct(’a’, avals, ’yy’, yyvals, ’fac’, v); v<-c(1,5,3,2,3,7) d<-data.frame(cbind(a=2, yy=6:1), v) Conditionals, control structures, loops Task for loops over values in vector v for i=v command1 command2 end If only one command: for (i in v) command If multiple commands: for (i in v) { command1 command2 } If/else statement ifelse() function if cond command1 command2 else command3 command4 end if (cond) { command1 command2 } else { command3 command4 } MATLAB also has the elseif statement. R uses chained “else if” statements. > print(ifelse(c(T,F), 2, 3)) [1] 2 3 Help! Task Get help on a function help fminsearch help(pmin) or ?pmin Search the help for a word lookfor inverse ??inverse Describe a variable class(a) class(a) str(a) Show variables in environment who ls() Underlying type of variable whos(‘a’) typeof(a) Example: k-means clustering of Fisher Iris data Fisher Iris Dataset sepal_length,sepal_width,petal_length,petal_width,species 5.1,3.5,1.4,0.2,setosa 4.9,3.0,1.4,0.2,setosa 4.7,3.2,1.3,0.2,setosa 4.6,3.1,1.5,0.2,setosa … Matlab and R as programming languages Scripting, real-time analysis Scripting, real-time analysis File-based environments Files unimportant Imperative programming style Functional programming style (impure) Statically scoped Dynamically scoped Functions with multiple return values Functions with named arguments, lazy evaluation Evolving OOP system Multiple competing OOP systems Can be compiled Cannot be compiled Large library of functions Professional developed, cost money Large library of functions Varying quality and support Can embed (in) many other languages Can embed (in) many other languages Functions function [a, b] = minmax(z) % one function per .m file! % assign to formal return names a = min(z) b = max(z) end minmax <- function(c, opt=12) { # functions are assigned to # variables ret <- list(min = min(z), max = max(z)) ret # last statement is # return value } % if minmax.m in path [smallest, largest] = … minmax([1 30 3]) # if minmax was created in current # environment x <- minmax(c(1, 30, 3)) smallest <- x$min Object-Oriented Programming • Formerly: objects were defined by a directory tree, with one method per file • As of 2008: new classdef syntax resembles other languages • S3 classes: attributes + syntax – class(object) – plot.lm() • S4 classes: definitions + methods • R.oo, proto, etc… Other notes • r.matlab package • Graphics – Matlab has much better 3-d/interactive graphics support – R has ggplot2 and much better statistical graphics Additional Resources • • • • • Will Dwinell, Data Mining in MATLAB Computerworld article on Cleve Moler Mathworks Matlabcentral Comparison of Data Analysis packages (http://anyall.org/blog/2009/02/comparison-ofdata-analysis-packages-r-matlab-scipy-excel-sas-spssstata/) • R.matlab package • stackoverflow References used for this talk • David Hiebeler MATLAB/R Reference document: http://www.math.umaine.edu/~hiebeler/comp/matl abR.html • http://www.cyclismo.org/tutorial/R/index.html • http://www.stat.berkeley.edu/~spector/R.pdf • MATLAB documentation • http://www.r-cookbook.com/node/23 Thank You!