The 3 C’s of Successful Agile Implementation

Download Report

Transcript The 3 C’s of Successful Agile Implementation

Team 3
Ned Bakelman
Chris Clark
Kamal Khan
Dmitry Nikelshpur
Bobby Tesoriero
Source: [http://artedi.ebc.uu.se/course/BioInfo-10p-2006/projects/katarzyna/Bioinformatics.html]
Bioinformatics
7/21/2015
Discussion Topics
•
•
•
•
Background
The Human Genome Project
The Techniques
An Example
7/21/2015
2
Background
What is it?
[Source: http://abhishek-tiwari.com/2009/02/bioinformaticselephant-and-blind-men.html]
Bioinformatics is a field of science where biology, computer science, and
information technology come together. The ultimate goal is to enable the
discovery of new biological insights and understand the mystery of life.
[Defined by National Center for Biotechnology Information]
7/21/2015
3
Background
Major Aims of Bioinformatics
• Store the biological data
• Develop tools that
perform computationally
intensive tasks
• Analyze & interpret data
in a biologically
meaningful manner
[Source: http://www.ittc.ku.edu/bioinfo_seminar/F07.html]
7/21/2015
4
Background
Why are Computers Important?
• Transform Biology science
from wet lab experiments
to Information Science.
 Use computers to execute
repetitive tasks millions of
times, such as sequencing
 Use computers to solve
complex problems, such as
understanding protein
folding
Source: [http://www.bibalex.org/libraries/presentation/static/Bioinformatics.pdf]
7/21/2015
5
The Human Genome Project
Biology 101
 A strand of DNA would extend
more than 6 feet stretched out in a
line.
 DNA is twisted around bead-like
proteins called histones.
The histones are also coiled
tightly to form chromosomes
located in the nucleus of the cell.
 The length of the DNA for 100
trillion cells in the human body
would stretch over 113 billion
miles [Source: Centre for Integrated Genomics]
Video Excerpt: DNA Replication
7/21/2015
6
The Human Genome Project
Timeline
Project goals: Successfully map the projected 20,00025,000 human genes, and determine the 3 billion DNA
subunits.
[Source: http://sblazak.wordpress.com/2010/04/26/]
[Source: http://www.freewebs.com]
5/1990
DOE & NIH
presented 15 year
HGP plan to
Congress
7/21/2015
6/2000
2/2001
Draft
Genome
Completed
Published
Draft
Genome
4/2003
Genome
Mapping
Completed
10/2004
Published
completed
Genome
5/2006
Last
Chromosome
published
7
[Source: http://translatingscience.com/]
7/21/2015
8
The Human Genome Project
Impacts – Genetic Association
Source: http://translatingscience.com/2010/11/04/individualized-medicine-a-dawn-of-a-new-era/
7/21/2015
9
The Human Genome Project
Impacts - Personalized Medicine
• Optimize drugs and drug combinations based on each
individual's unique genetic makeup
• Predisposition testing – determine who is at risk for a disease
o BRAC1/BRAC2 – predicting breast or ovarian cancers
o Type 2 diabetes – screening individuals with glucose
tolerance for TCF2L2 may help identify high risks
population
• Pharmacogenomics – determine which drug and how much of
the drug is best for a given individual based on the individual’s
genotype
7/21/2015
10
The Techniques
Research 101
• Statistical Algorithms
o
o
T – Test
ANOVA
• Clustering Techniques
o
o
o
7/21/2015
K Means
K Nearest Neighborhood (KNN)
Ward’s Method
11
The Techniques
Research 101 - continued
• T – Test (t-statistic)
o
o
o
Introduced by William Sealy Gosset in 1908
Appropriate whenever you want to compare the means of two groups
Assesses whether the means of two groups are statistically different from
each other
Because there is little overlap
between the bell shaped
curves
7/21/2015
- Control Group
- Treatment Group
Conclude the two groups
appear most different or
distinct
12
The Techniques
• ANOVA
o
o
o
Research 101 - continued
Analysis of Variance
Appropriate whenever you want to compare the means of multiple groups
Basic idea is to compare the variability among groups, to that within groups
o
If variability among groups is small (relative to within groups)
o
o
Supports that the means of different groups are similar (Null Hypothesis)
If variability among groups is large (relative to within groups)
o
Supports that the means of different groups are statistically different (Alternative Hypothesis)
Little Statistical Difference
Significant Statistical Difference
7/21/2015
13
The Techniques
Research 101 - continued
• K - Means (Partition Clustering / Unsupervised)
o
o
Decomposes Data into a set of clusters
k parameter (accounts for the number of clusters) (must be specified)
o
o
o
o
o
7/21/2015
k points in (Euclidian) space are randomly chosen as cluster centers
Items get assigned to their closest cluster center
The mean (Centroid) of the items in each cluster is calculated
These Centroids become new “center values” as the whole process repeats
Iteration continues until same of similar points are assigned to each cluster
14
The Techniques
Research 101 - continued
• K – Nearest Neighbor (Partition Clustering / Supervised)
o
o
Classifies new unlabeled patterns based on training samples
k parameter (must be specified)
o
o
o
Number of patterns closest to the query point
Classification performed using majority vote among the k patterns
Uses minimum (Euclidian) distance to determine nearest neighbors
Example
• Three Classes (labeled examples from training data)
• Query point X
• Using Euclidian distance, k = 5
• X is assigned to W1
7/21/2015
15
The Techniques
Research 101 - continued
• Ward’s Method – (Hierarchal Clustering / Unsupervised)
o
o
o
Agglomerative Procedure (Bottom Up)
Begin with each pattern in a distinct cluster
Successively merge clusters together until a stopping criterion is satisfied
o
o
o
7/21/2015
A similarity distance matrix is constructed using pairwise distance between all patterns
Each pattern is assigned to a single cluster
The two clusters with minimum (Euclidian) distance are merged to form a cluster
16
The Techniques
Sequencing
DNA – 4 letter strings
RNA – 4 letter strings
Protein – 20 letter strings
7/21/2015
17
The Techniques
Sequencing - continued
Nucleotide
Or
Base Sequence
Transcribed
RNA Sequence
Translated
Protein Sequence
7/21/2015
18
The Techniques
Sequencing - continued
7/21/2015
19
The Techniques
Sequencing - continued
Sequence Analysis
• Content Analysis
o
Focuses on broad characteristics of a
sequence (i.e. Tendency to code for
proteins, fulfillment of certain biological
functions, etc.)
• Signal Analysis
– Recognition of “short” signals or motifs
in a sequence (i.e. gene structural
elements, regulatory elements)
Video Excerpt: How DNA Makes Protein
7/21/2015
20
An Example
EVI1 Study
*
7/21/2015
21
An Example
EVI1 Study- continued
Aberrant DNAhypermethylation signature in acute myeloid leukemia directed by EVI1
Sanne Lugthart,1 Maria E. Figueroa,2 Eric Bindels,1 Lucy Skrabanek,3 Peter J. M. Valk,1 Yushan Li,2 Stefan Meyer,4
Claudia Erpelinck-Verschueren,1 John Greally,5,6 Bob Lo¨wenberg,1 Ari Melnick,2 and Ruud Delwel1
Blood. 2011;117(1):234-241
7/21/2015
22
An Example
EVI1 Study- continued
Aberrant DNAhypermethylation signature in acute myeloid leukemia directed by EVI1
Sanne Lugthart,1 Maria E. Figueroa,2 Eric Bindels,1 Lucy Skrabanek,3 Peter J. M. Valk,1 Yushan Li,2 Stefan Meyer,4
Claudia Erpelinck-Verschueren,1 John Greally,5,6 Bob Lo¨wenberg,1 Ari Melnick,2 and Ruud Delwel1
Blood. 2011;117(1):234-241
7/21/2015
23
References
•
•
•
•
•
•
•
•
Human Genome Project,
http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml, last modified
3, Feb, 2011, viewed 6, Mar, 2011
Howstuffworks.com, Freudenrich, Craig, “How DNA Works”,
http://science.howstuffworks.com/environmental/life/cellularmicroscopic/dna1.htm, viewed 6 Mar 2011.
National Center for Biotechnology Information, “Bioinformatics”,
http://www.ncbi.nlm.nih.gov/About/primer/bioinformatics.html, Last modified 29,
Mar, 2004, viewed 6, Mar, 2011
Nair, A. “Computational Biology & Bioinformatics: A Gentle Overview”,
Communications of the Computer Society of India, January 2007
Peres-Iratxeta, C., Andreade-Navarro, M., Wren, J.,
http://bib.oxfordjournals.org/content/8/2/88.full, "Evolving research trends in
bioinformatics” ,Briefings in Bioinformatics, Vol. 8, Issue 2, p88-95, October 31, 2006.
Salib, M. & Abdelrassoul, M., “Bioinofrmatics”,
http://www.bibalex.org/libraries/presentation/static/Bioinformatics.pdf, Bibliotheca
Alexandrina, 080415
Shrestha, R., “What is Bioinformatics? – A General Perspective”,
http://raunakms.wordpress.com/2010/06/05/what-is-bioinformatics-%E2%80%93-ageneral-perspective/, last modified 5, Jun, 2011, viewed 6, Mar, 2011
Wikipedia.com, “Bioinformatics” http://en.wikipedia.org/wiki/Bioinformatics, last
modified 6, Mar, 2011, viewed 6, Mar, 2011
7/21/2015
24
References - continued
•
•
•
•
•
•
•
“Analysis of variance - Wikipedia, the free encyclopedia.” [Online]. Available:
http://en.wikipedia.org/wiki/Anova. [Accessed: 11-Mar-2011]
“Analysis of variance (ANOVA),” Spring-2009. [Online]. Available:
http://geography.uoregon.edu/geogr/topics/anovaex1.htm. [Accessed: 11-Mar-2011]
“Clustering - K-means demo.” [Online]. Available:
http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html.
[Accessed: 12-Mar-2011]
“Google Image Result for http://cs.jhu.edu/~razvanm/fsexpedition/summary.tux3.hclust.canberra.ward.png.” [Online]. [Accessed: 13-Mar2011]
W. Trochim, “The T-Test,” Research Methods Knowledge Base, 2nd Edition, 20-Oct2006. [Online]. Available: http://www.socialresearchmethods.net/kb/stat_t.php.
[Accessed: 10-Mar-2011]
I. Frades and R. Matthiesen, “Overview on Techniques in Cluster Analysis,” in
Bioinformatics Methods in Clinical Research, Humana Press, 2009, pp. 81-107
S. Lugthart, M. Figueroa, and E. Bindels, et al, “Aberrant DNA hypermethylation
signature in acute myeloid leukemia directed by EVI1,” blood, vol. 117, no. 1, pp. 234241, Jan. 2011
7/21/2015
25