Bioinformatics at WSU

Download Report

Transcript Bioinformatics at WSU

Bioinformatics at WSU
Matt Settles
Bioinformatics Core
Washington State University
Wednesday, April 23, 2008
WSU Linux User Group (LUG)
What is Bioinformatics
The analysis of biological information using
computers and statistical techniques
Computational Biologist
Biology
Statistics
Biostatistician
Computer
Science
Bioinformatician
Subfields of Bioinformatics

Sequence analysis


Genome annotation


Main areas: Systematics, Phylogenetics
Analysis of High-throughput data


Main areas: Gene finding, Gene predicting
Computational evolutionary biology


Main areas: Sequence alignment and Sequence
databases
Main areas: RNA microarrays, aCGH, Whole
genome genotyping arrays.
Analysis of Whole Genome Sequencing Data

Emerging Field
Subfields of Bioinformatics

Comparative genomics


Systems Biology








How are species different and how are they the
same?
Networks of Networks (the golden goose!!)
Quantitative Genetics
Measuring Biodiversity
Modeling biological systems
High-throughput image analysis
Analysis of protein expression
Prediction of protein structure
Protein-protein docking
What does a Bioinformatician Do?

Works in an interdisciplinary team





Software Development



Design of experiments
Data management, databases
Analysis from start to finish
Data integration, annotation, visualization
Visual tools
Databases
Research

New techniques for the storage and analysis of
biological data, both statistical and compuational
What tools do we use

Software programs developed by others,



GUI and command line
Open Source preferably
Statistical Programming
Languages/Environments

R – programming environment






www.r-project.org
www.bioconductor.org
C like Interpreted language that acts similar to
scheme,
Full graphics capabilities
C/python/perl interfaces
Software programs we ourselves develop
Central dogma of molecular biology
Each gene is transcribed (at the appropriate time)
from DNA into mRNA, which then leaves the
nucleus and is translated into the required protein.
Whole Genome Association
Analysis

Whole Genome Genotyping Array



Samples





Bovine (COW) 58,000 SNPs Illumina Beadarray
Represents all 29 chromosomes, X chromosome
and the Unknown chromosome
255 dairy cattle from 4 different heards
130 Control cattle (healthy)
125 Johne's positive cattle (sick)
14.8 MILLION DATA POINTS !!!
Biological Question of interest

Is there a collection of SNPs that are associated
with the disease Johne's?
Analysis Outline


Read in and format data into something we
can work with in R and plink.
Quality Assurance





Toss samples that do not meet QA (7 samples)
Toss SNPs that do not meet QA (8,935 SNPs)
Treat SNPs as independent and analyze
each with a statistical model.
Correct for multiple testing
Visualize results
Results

10 regions were identified as being
potentially interesting with a p < 0.001
multiple testing correction (permutation
based)
Next Step



Validate in the lab, the regions of interest.
Perform multi-locus analysis, computer
cluster will be necessary here.
Mine the data for additional information
Job Position







Position with the Bioinformatics Core
~ 20 hours per week
~ $12-$15/hour
Potential internship credit
Description: Aid in the analysis of microarray
data, create analysis pipeline to be used by
WSU researchers.
Required Skills: know how to code
Bonuses: Possibility of publications!
The END
QUESTIONS??