Introduction to Statistical Analysis Using R

Download Report

Transcript Introduction to Statistical Analysis Using R

Introduction to Statistical
Analysis Using R
Nick, Caroline, Tanya
What is R?
• R is a programming language for data
analysis and graphics
• All information about R is found on
http://www.R-project.org
• R system contains two major components:
1.Base system – contains the R language
software and the high priority add-on
packages listed on pg.3
2.User contributed add-on packages
Who uses R?
• All scientists especially those working in
developing countries
– It allows universal free access to state of the
art tools for statistical data analysis
– Most widely used for teaching undergraduates
and graduates statistics b/c the students can
use it free of cost
Installing R–Base System
1. Go to http://CRAN.R-project.org
2. Choose your computer from the list
(Linux, MacOS X, or Windows)
3. Click on Base (Base or Contrib)
4. Click on R-2.6.1-win32.exe
5. Save R
Getting Started
• Changing prompt - pg.3
• Example – using R as a pocket
calculator – pg.3
• Storing vs. Printing
• R is not space sensitive, but it is
case sensitive
Getting Help in R
• The Help system is a collection of manual
pages describing each function and data
set that comes with R
• Help/manual page is shown when the
name of the function we would like to get
help for is supplied to the help function
– Ex. help(“mean”) or help(mean) or ?mean
Installing add-on packages
•
All packages are available on:
http://CRAN.Rproject.org/src/contrib/PACKAGES.html
– Pick package from list and download
•
To install add-on package:
1. install.packages(“package name”)
2. library(“package name”)
Forbes2000 Example
• Go to http://CRAN.Rproject.org/src/contrib/PACKAGES.html and
select HSAUR from the list
• Choose what pertains to your computer ex.
Windows binary HSAUR 1.2-1.zip
• Save to desktop
• Find Forbes2000 list in rawdata folder
• Install in R :
– install.package(“HSAUR”)
– library(“HSAUR”)
Working with Data Sets – Ex.
Forbes 2000 list
• Vector – elementary structure for data handling in R; set
of simple elements, all being objects of the same class
– Ex. First 3 companies in Forbes Forbes2000[,"name"][1:3]
• Variable names – headings
– names(Forbes2000)
– Finding structures of data set – useful for large data
sets
• str(Forbes2000)
• Dimensions
– dim(Forbes2000)
– nrow(Forbes2000)
– ncol(Forbes2000)
Simple Summary Statistics
• Mean – mean(Forbes2000 [,”sales”])
• Median – median(Forbes2000 [,”assets”])
• Range – range(Forbes2000 [,”sales”])
Importing Data Not Part of a
Package
• When is this used?
– Most data sets are not part of a down-loadable
package
– Most people need to import their own data sets into R
• Example – Airport data (download to Desktop)
• In R:
– File → Change Dir → Desktop → OK
– name given < - read.table (“airport.csv”, header =
TRUE, sep = “,”, row.names =1
Making a Graph
• Graph of “Rank” of airport vs. “Shop”
• Plot (Rank ~ Shop, data = “name given”,
pch =“O”)
Homework
•
•
•
•
Change Prompt “>” to “R>”
Import Airport Data Set from Excel
Print data set in R
Find the Dimensions, the number of Columns,
and the number of Rows in the data set
• Find structure of data set
• Find median of category “Shop”
• Find mean of “Domestic”
Contact info
• Tanya – [email protected]
• Caroline – [email protected]
• Nick – [email protected]