Introduction to the R language

Download Report

Transcript Introduction to the R language

Introduction to microarray data
analysis with Bioconductor
Katherine S. Pollard
March 11, 2004
© Copyright 2004, all rights reserved
Bioconductor
o Open source and open development R software
project for the analysis and comprehension of
biomedical and genomic data.
– Gene expression arrays (cDNA, Affymetrix)
– Pathway graphs
– Genome sequence data
o Started in 2001 by Robert Gentleman, Dana
Farber Cancer Institute.
o About 25 core developers, at various institutions
in the US and Europe.
o Tools for integrating biological metadata from the
web (annotation, literature) in the analysis of
experimental data.
Websites
o Bioconductor: www.bioconductor.org
– software, data, and documentation;
– training materials from short courses;
www.bioconductor.org/workshops/UCSC03/ucsc03.html
– mailing list.
o R: www.r-project.org
– software;
– documentation;
– RNews.
Basic R Commands
o Working directory/file path: File – Change dir
> setwd(“C:/cygwin”)
o List objects in session: Misc – List objects
> ls()
o Delete objects from session: Misc – Remove all objects
> rm(my.matrix)
o Run a script: File – Source R code
> source(“mycode.R”)
o Stopping R: File - Exit
> q()
Getting Help
o Details about a specific command whose name
you know (input arguments, options, algorithm):
> ? t.test
> help(t.test)
> example(t.test)
> t.test
o Information about commands containing a
certain text string:
> apropos(“test”)
> help.search(“test”)
Packages & Vignettes
o Load a package library: Packages menu
> library(marrayTools)
o Run the package vignette:
> library(tkWidgets)
> vExplorer()
> openVignette()
o Read the Vignette PDF file
o Look at Short Courses and Lab Materials
Storing Data
o Every R object (or the whole current working
environment) can be stored into and restored
from a file with the commands “save” and
“load”. OR by using the File menu.
> save(x, file=“x.RData”)
> load(“x.RData”)
> save.image(“splicingArrays.RData”)
o These files are portable between MSWindows, Unix, Mac versions of R.
Importing and Exporting Data
o There are many ways to get data in and out.
o Most programs (e.g. Excel), as well as humans,
know how to deal with rectangular tables in the
form of tab-delimited text files.
> x <- read.delim(“filename.txt”)
Also: read.table, read.csv, scan
> write.table(x, file=“x.txt”, sep=“\t”)
Also: write.matrix, write
Script to import GenePix data
library(marrayTools)
importGPR<-function(gal,details){
g.info<-read.marrayInfo(fname=gal,info.id=4:5,labels=5)
a.info<-read.marrayInfo(fname=details,labels=2)
grid<-read.marrayLayout(fname=gal,ngr=4,ngc=4,nsr=24,
nsc=25,pl.col=7,ctl.col=6)
data<-read.GenePix(layout=grid,targets=a.info,
gnames=g.info,name.Gf="F532 Median",
name.Rf="F635 Median")
return(data)
}
data.raw<-importGPR(galfile,detailsfile)
o Note: If .gal file has n lines at top, before data begins, use skip=n
o Note: read.GenePix will read ALL .gpr files in current directory. To
read certain files (and to specify the order) use fname argument.
Working with log ratios
o Loess normalization by print tip:
data.norm<-maNormMain(data.raw)
ratios<-as(data.norm,"exprSet")
o Array statistics:
apply(exprs(ratios),2,summary)
apply(maGb(data.raw),2,median,na.rm=TRUE)
o Combine replicate spots on an array:
meanM<-aggregate(exprs(ratios),
list(maLabels(maGnames(data.norm))), mean,
na.rm=TRUE)
o Export normalized log2 ratios:
write.table(meanM,“Mvals.txt”,sep=“\t”,row.names=F)
Useful R/BioC Packages
marrayTools,
marrayPlots
Spotted cDNA array analysis
affy
Affymetrix array analysis
vsn
Variance stabilization
annotate
Link microarray data to metadata
on the web
ctest
Statistical tests
genefilter, limma,
multtest, siggenes
mva, cluster,
clust
class, rpart, nnet
Gene filtering (e.g.: differential
expression)
Clustering
Classification
Acknowledgments
Workshop materials
developed with
Bioconductor core
developers include
•
•
• Vince Carey, Harvard
• Yongchao Ge, Mount Sinai
School of Medicine
• Robert Gentleman, Harvard
• Jeff Gentry, Dana-Farber
Cancer Institute
• Rafael Irizarry, Johns Hopkins
• Yee Hwa (Jean) Yang, UCSF
• Jianhua (John) Zhang, DanaFarber Cancer Institute
• Sandrine Dudoit, UC
Berkeley
Robert Gentleman, Harvard
Sandrine Dudoit, UC
Berkeley