Microarray DNA Sequence Data Analysis using Boolean Networks Project Mentor Dr. Shahadat Kowuser Team Simon Bartholomew Patrick Bailey Loodwing Murillo.

Download Report

Transcript Microarray DNA Sequence Data Analysis using Boolean Networks Project Mentor Dr. Shahadat Kowuser Team Simon Bartholomew Patrick Bailey Loodwing Murillo.

Microarray DNA Sequence Data Analysis
using Boolean Networks
Project Mentor
Dr. Shahadat Kowuser
Team
Simon Bartholomew
Patrick Bailey
Loodwing Murillo
1
Presentation Outline
•What is a Microarray ?
•Microarray Technology
•DNA Sequences
•Boolean Algebra
•Boolean Networks
•Microarray Data Analysis
•Conclusion
DNA Microarray
 DNA microarrays allow the researcher to analyze
the records of a number of genes simultaneously.
 A microarray is a collection of thousands of
small test locations, arranged in a 1” x 3” array.
 Each test location has a small fragment of DNA,
called a probe (about 20-70 bases), which
corresponds to a particular gene.
 Fragments of mRNA (recently transcribed
messenger RNA) from a test subject bind to each
probe.
 We measure the quantity of mRNA that “sticks”
to each probe, to determine how much mRNA for
that gene is present in the sample.
3
What is a Microarray?
 A kind of gene chip used to discover gene function or
gene expression patterns
 Allow these patterns to be studied in parallel
 Example:
In each location, a
known probe
(cDNA) is placed
with cDNA from a
certain sample
For example, cDNA
from cancerous and
healthy cells with
different probes
(known strands of
cDNA)
 Color indicates the intensity of a labeled cDNA, meaning the
gene has been activated.
4
An Microarray Slide
In this schematic:
GREEN represents Control DNA
RED represents Sample DNA
YELLOW represents a combination of
Control and Sample DNA
BLACK represents areas where neither the
Control nor Sample DNA
Each color in an array represents either
healthy (control) or diseased (sample)
tissue.
The location and intensity of a color tell us
whether the gene is present in
the control and/or sample DNA.
5
DNA Sequence
 Linear polymers” composed of three types of building blocks:



1) Phosphate group (PH3) 3’ Prime
2) Sugar (five-carbon ribose or five-carbon deoxyribose)
3) Bases (attached to sugar molecules)
4) Hydroxyls. 5’ Prime
 DNA: {Adenine - A, Guanin - G, Thymin - T, Cytosine – C}
 RNA: {Adenine - A, Guanin - G, Uracil - U, Cytosine – C}
 Oriented sequences (distinguish 3’ and 5’ ends)
 Symbols are Bases: G, C, A, T

DNA = nucleotide sequence

DNA  mRNA (single stranded)


Alphabet size = 4 (A,C,G,U)
mRNA  amino acid sequence



Alphabet size = 4 (A,C,G,T)
Alphabet size = 20
Amino acid sequence “folds” into 3-dimensional molecule called protein 6
Chromotagraph
7
Microarray Technology
Millions of DNA strands
build up on each location.
Tagged probes become hybridized to
the DNA chip’s microarray.
8
The DNA Molecule
5’ (Phosphate)
G -- C
A -- T
T -- A
G -- C
C -- G
Base pairing property
G -- C
T -- A
G -- C
T -- A
T -- A
A -- T
A -- T
C -- G
T -- A
3’(Hydroxyl)
Base = Nucleotide

9
10
Boolean Algebra (basic gates)
And
OR
NOR
0=F
1=T
In A
In B
Out
In A In B
Out
In A
In B
Out
0
0
0
1
0
0
1
0
1
0
0
0
1
1
0
1
0
1
0
0
0
0
0
1
1
0
0
1
1
1
1
1
1
1
1
0
All High=High, else Low|Any high=High, else Low|Any high=Low, else High
Boolean Algebra (continued)
XOR
NAND
NOT
In A In B Out
In A In B Out
0
0
0
0
0
1
In A
Out
0
1
1
0
1
1
1
0
1
0
1
1
0
1
0
1
1
1
0
1
1
0
Diff=High, Same=Low
All High= Low, Else High
(Inverter)
Boolean Networks
 A Boolean network is an acyclic graph.
 Each node of the graph is a gate (may not be basic).
 Each edge implies a connection between two gates.
 Example:
x1 x2 x3 x4 x5
 Description of the network:
 y1 = x’2 + x’3
(AND)
y1
y2
 y2 = x’4 + x’5
(OR)
 y3 = x’4y’1
(NOR)
y3
 y4 = x1 + y’3
 y5 = x6y2 + x’6y’3
y4
y5
x6
Boolean Networks and Gene Interactions
111
010
011
000
101
100
T
T+1
G1 G2 G3
G1 G2 G3
0 0 0
0 0 1
0 0 1
0 0 1
0 1 0
1 0 1
0 1 1
0 0 0
1 0 0
1 0 1
1 0 1
0 1 0
1 1 0
0 0 1
1 1 1
0 1 1
001
110
Attractors (point,
periodic)
14
Boolean functions depend on:
 Code graph;
 Decoding algorithm;
 Initial state of network variables (some cases);
C1: {1,2,3,6,7,11}; C2: {1,2,3,4,5,12};
C3: {6,7,8,11}; C4: {1,4,8,9,10,12};
C5:{2,4,5,7,8,9}; C6:{3,5,6,9,10,11}
Controls for node V: {1,2,3,4,5,6,7,8,9,10,11,12}=V
V2: {1,2,3,4,5,6,7,8,9,11,12}=V-{10}
Definition: The set of controls for variable i are the variables
sharing a check with i, including i. The set of control nodes are
the inputs for the Boolean function assigned to variable i.
15
Microarray Data Analysis
16
Conclusion
 Our future work will be using distance measures
 Calculate Euclidean Distance
 Calculate Pearson Correlation Coefficient
 Analyze Mutual Information
 Develop Clustering Networks and Algorithms
 Develop 3D Visualization Framework with JavaScript
Language
Acknowledgement :
We greatly appreciate Dr.Samir Raychowdhury and Dr.Godiwn
Mabamalu for supporting our project and giving us the great
opportunity to carry out this project. Finally, we prepare to publish a
journal paper with Dr.Kowuser.
17
Questions?
18
Thank You
19