Bioinformatics at WSU
Download
Report
Transcript Bioinformatics at WSU
Bioinformatics at WSU
Matt Settles
Bioinformatics Core
Washington State University
Wednesday, April 23, 2008
WSU Linux User Group (LUG)
What is Bioinformatics
The analysis of biological information using
computers and statistical techniques
Computational Biologist
Biology
Statistics
Biostatistician
Computer
Science
Bioinformatician
Subfields of Bioinformatics
Sequence analysis
Genome annotation
Main areas: Systematics, Phylogenetics
Analysis of High-throughput data
Main areas: Gene finding, Gene predicting
Computational evolutionary biology
Main areas: Sequence alignment and Sequence
databases
Main areas: RNA microarrays, aCGH, Whole
genome genotyping arrays.
Analysis of Whole Genome Sequencing Data
Emerging Field
Subfields of Bioinformatics
Comparative genomics
Systems Biology
How are species different and how are they the
same?
Networks of Networks (the golden goose!!)
Quantitative Genetics
Measuring Biodiversity
Modeling biological systems
High-throughput image analysis
Analysis of protein expression
Prediction of protein structure
Protein-protein docking
What does a Bioinformatician Do?
Works in an interdisciplinary team
Software Development
Design of experiments
Data management, databases
Analysis from start to finish
Data integration, annotation, visualization
Visual tools
Databases
Research
New techniques for the storage and analysis of
biological data, both statistical and compuational
What tools do we use
Software programs developed by others,
GUI and command line
Open Source preferably
Statistical Programming
Languages/Environments
R – programming environment
www.r-project.org
www.bioconductor.org
C like Interpreted language that acts similar to
scheme,
Full graphics capabilities
C/python/perl interfaces
Software programs we ourselves develop
Central dogma of molecular biology
Each gene is transcribed (at the appropriate time)
from DNA into mRNA, which then leaves the
nucleus and is translated into the required protein.
Whole Genome Association
Analysis
Whole Genome Genotyping Array
Samples
Bovine (COW) 58,000 SNPs Illumina Beadarray
Represents all 29 chromosomes, X chromosome
and the Unknown chromosome
255 dairy cattle from 4 different heards
130 Control cattle (healthy)
125 Johne's positive cattle (sick)
14.8 MILLION DATA POINTS !!!
Biological Question of interest
Is there a collection of SNPs that are associated
with the disease Johne's?
Analysis Outline
Read in and format data into something we
can work with in R and plink.
Quality Assurance
Toss samples that do not meet QA (7 samples)
Toss SNPs that do not meet QA (8,935 SNPs)
Treat SNPs as independent and analyze
each with a statistical model.
Correct for multiple testing
Visualize results
Results
10 regions were identified as being
potentially interesting with a p < 0.001
multiple testing correction (permutation
based)
Next Step
Validate in the lab, the regions of interest.
Perform multi-locus analysis, computer
cluster will be necessary here.
Mine the data for additional information
Job Position
Position with the Bioinformatics Core
~ 20 hours per week
~ $12-$15/hour
Potential internship credit
Description: Aid in the analysis of microarray
data, create analysis pipeline to be used by
WSU researchers.
Required Skills: know how to code
Bonuses: Possibility of publications!
The END
QUESTIONS??