Bioinformatics lectures at Rice University

Download Report

Transcript Bioinformatics lectures at Rice University

Bioinformatics lectures at Rice
University
Lecture-3 Common tools used in
bioinformatics and an introduction to R
Li Zhang
March, 2012
Summary of previous lectures
• Lecture 1. Introduction to Bioinformatics
• Lecture 2. High throughput technologies in genomics.
• Lecture 3. Common tools used bioinformatics. An
introduction to R.
Common tools used bioinformatics
•Bioperl. (http://www.bioperl.org/wiki/BioPerl_Tutorial )
•Biopython. (http://biopython.org/DIST/docs/tutorial/Tutorial.html)
•Matlab, SAS, S plus.
•R. Packages in Bioconductor.
•Genomic data visualization: Genome browser
•Databases: MySQL; S3DB; CouchDB.
•Other commercial bioinformatics software:
Pathway visualization: Ingenuity
Illumina Inc’s Genome Studio
Partek, Genespring, Spotfire, DNAnexus
R code Example: Hierarchical clustering
Getting started with Bioconductor
Bioconductor is designed to grow to meet
your genomic analysis needs. A typical
entree into the Bioconductor project is:
1) Read the online Bioconductor common
workflows.
2) Download and install the Bioconductor
software.
3) Subscribe to the Bioconductor mailing lists to
become part of the Bioconductor community.
4) Read Bioconductor publications for a more
comprehensive treatment of the software.
Goals of the Bioconductor Project
• To provide widespread access to a broad range of powerful
statistical and graphical methods for the analysis of
genomic data.
• To facilitate the inclusion of biological metadata in the
analysis of genomic data, e.g. literature data from PubMed,
annotation data from LocusLink.
• To provide a common software platform that enables the
rapid development and deployment of extensible, scalable,
and interoperable software.
• To further scientific understanding by producing highquality documentation and reproducible research.
• To train researchers on computational and statistical
methods for the analysis of genomic data.
Main Features of the Bioconductor Project
• It contains a high-level interpreted language in which one can easily and
quickly prototype new computational methods.
• It includes a well established system for packaging together software
components and documentation.
• It can address the diversity and complexity of computational biology and
bioinformatics problems in a common object-oriented framework.
• It provides on-line computational biology and bioinformatics data sources.
• It supports a rich set of statistical simulation and modeling activities.
• It contains cutting edge data and model visualization capabilities.
• It has been the basis for pathbreaking research in parallel statistical
computing.
• It is under very active development by a dedicated team of researchers
with a strong commitment to good documentation and software design.
Home work:
1) Install R and Bioconductor.
2) Use R to draw a set of concentric circles
3) Use the following code to generate to
random variables x and y. What is the
average pearson correlation between x and
y? Also, please make a scatter plot between x
and y. What is the relationship between x
and y?
Tmp=rnorm(500)
x = rnorm(500)+Tmp
y = rnorm(500)+Tmp