Udine Lectures Lecture #5: Microarray Analysis of Networks ¦ Bud Mishra Professor of Computer Science and Mathematics (Courant, NYU) Professor (Watson School, CSHL) 7 ¦ 9 ¦
Download ReportTranscript Udine Lectures Lecture #5: Microarray Analysis of Networks ¦ Bud Mishra Professor of Computer Science and Mathematics (Courant, NYU) Professor (Watson School, CSHL) 7 ¦ 9 ¦
Udine Lectures Lecture #5:
Microarray Analysis of Networks
¦ Bud Mishra Professor of Computer Science and Mathematics (Courant, NYU) Professor (Watson School, CSHL) 7
¦
9
¦
2002 4/30/2020 © Bud Mishra, 2002 Lec.5-1
Gene Expression
4/30/2020 © Bud Mishra, 2002 Lec.5-2
Transcriptional State of a Cell
• Transcriptional state of a cell can be characterized by detecting and quantitating gene expression levels: – northern blots – S1 nuclease protection – differential display – sequencing of cDNA libraries – serial analysis of gene expression (cDNA) – Array based technologies : • spotted arrays • oligonucleaotide arrays 4/30/2020 © Bud Mishra, 2002 Lec.5-3
Gene Expression Data
• Microarrays enable one to simultaneously measure the activity of up to 30,000 ( » 10 4 —10 5 ) genes.
• In particular, the amount of mRNA for each gene in a given sample (or a pair of samples) can be measured.
4/30/2020 © Bud Mishra, 2002 Lec.5-4
Gene Expression Data
• Microarrays provide a tool for answering a wide variety of questions: – In which cells is each gene active?
– Under what environmental conditions is each gene active?
– How does the activity level of a gene change under different conditions?
• Stage of a cell cycle?
• Environmental conditions?
• Diseases?
4/30/2020 © Bud Mishra, 2002 Lec.5-5
Gene Expression Data
• Functional genomics with microarrays: – What genes can be inferred to be regulated together?
– What happens to the expression level of every gene when a (candidate) gene is mutated?
– What can be inferred about the regulatory structure?
4/30/2020 © Bud Mishra, 2002 Lec.5-6
The Computational Tasks
• Clustering Genes: – Which genes are regulated together?
• Classifying Genes: – Which functional class does a particular gene fall into?
• Classifying Gene Expressions: – What can be learnt about a cell from the set of all mRNA expressed in a cell? Classifying diseases: Does a patient have ALL or AML (classes of Leukemia)?
• Inferring Regulatory Networks: – What is the “circuitry” of the cell?
4/30/2020 © Bud Mishra, 2002 Lec.5-7
4/30/2020
Microarrays
© Bud Mishra, 2002 Lec.5-8
Microarrays
• Two general types currently popular… – Spotted Arrays (Pat Brown, Stanford) – Oligonucleotide Arrays (Affymetrix) – Other variations (Agilent, Incyte, NGS, …) • The key idea is to query a genome for a particular pattern by complementary hybridization.
4/30/2020 © Bud Mishra, 2002 Lec.5-9
Complementary Hybridization
AGCGTTCGAATACC UCGCAAGCUUAUGG ATCGGTACGTTAACG mRNA only hybridizes here • Due to Watson Crick base pairing, an mRNA molecule will hybridize to a complementary DNA molecule.
CCGAAAATAGCCAG 4/30/2020 © Bud Mishra, 2002 Lec.5-10
Complementary Hybridization
AGCGTTCGAATACC TCGCAAGCTTATGG mRNA, first converted to cDNA cDNA hybridizes to a gene • Practical implementation of this idea: Reverse Transcriptase UCGCAAGCUUAUGG – Put the actual gene sequence on array – Convert mRNA to cDNA (copy DNA) using reverse transcriptase – Hybridize cDNA to the sequence on the array 4/30/2020 © Bud Mishra, 2002 Lec.5-11
4/30/2020 cDNA array
Spotted Array
• Robots array microscopic sized spots of DNA on glass slides – Each spot is DNA analog (cDNA) of one of the mRNA’s we wish to measure… © Bud Mishra, 2002 Lec.5-12
Reference Test 4/30/2020
Spotted Arrays
cDNA • Two samples (reference and test) of mRNA are converted to cDNA, labeled with fluorochrome dyes and allowed to hybridize to the array.
© Bud Mishra, 2002 Lec.5-13
Spotted Arrays
• Lasers applied to the arrays yield an emission for each fluorescent dye.
4/30/2020 © Bud Mishra, 2002 Lec.5-14
Oligonucleotde Arrays
• “Gene Chips” – Instead of putting entire genes on array, put sets of DNA 25-mers (synthesized oligonucleotides) – Produced using a photolithography process similar to the ones used to create semiconductor chips – mRNA samples are processed separately instead of in pairs (of reference/control and test) 4/30/2020 © Bud Mishra, 2002 Lec.5-15
Oligonucleotide Arrays
gene 25-mers • Given a gene to be queried/measured, select a large number ( » 20) 25-mers for that gene.
• Selection criteria – Specificity – Hybridization properties – Ease of manufacturing 4/30/2020 © Bud Mishra, 2002 Lec.5-16
Oligonucleotide Arrays
• Each of these probes is put on the chip • Additionally a slight variant (that differs only at the 13 th base) of each oligo is put next to it.
– This helps factor out false hybridization (pm[perfect match] vs. mm[mismatch]) • The measurement for a gene is derived from these 40 separate measurements.
4/30/2020 © Bud Mishra, 2002 Lec.5-17
Cluster Analysis
4/30/2020 © Bud Mishra, 2002 Lec.5-18
Genome-wide Cluster Analysis
• Eisen et al. PNAS 1998 • Put all genes ( » 6200) of single microarray S. cerevisae (yeast) on a • Measure experiment across experiments m independent • Group together genes that have similar expression profiles.
4/30/2020 © Bud Mishra, 2002 Lec.5-19
Genome-wide Cluster Analysis
• Each measurement G i represents log (red i /green i ) Where red is the test expression level and green is the reference expression level for gene G in the i th experiment.
• The expression profile of a gene is the vector of measurements across all experiments: h G 1 , …, G m i .
4/30/2020 © Bud Mishra, 2002 Lec.5-20
The Data
• 79 measurements for each of 2467 genes • Data collected at various times during – Diauxic shift (shutting down genes for metabolizing sugar, activating genes for metabolizing ethanol) – Mitotic cell division cycle – Sporulation – Temperature shock – Reducing Shock 4/30/2020 © Bud Mishra, 2002 Lec.5-21
The Data
• n genes measured in m experiments: Vector for a gene G 1,1 G 2,1 M G m,1 L L G 1,n G 2,n O M L G m,n 4/30/2020 © Bud Mishra, 2002 Lec.5-22
The Task
• Given – Expression profiles for a set of genes.
• Compute – An organization of genes into clusters such that genes within a cluster have similar profiles.
4/30/2020 © Bud Mishra, 2002 Lec.5-23
The Task
log (red i /green i ) 4/30/2020 Experiments © Bud Mishra, 2002 Lec.5-24
Approaches
• Eisen et al.: Hierarchical clustering.
• Other clustering methods have been applied to this gene expression data: – EM with Gaussian Clusters [Mjolsness et al. ’99] – Self Organizing Maps [Tamayo et al. ’99] – Graph Theory Algorithms [Ben-Dor & Yakhini ’98, Hartuv et al. ’99] 4/30/2020 © Bud Mishra, 2002 Lec.5-25
Degrees of dissimilarity
Hierarchical Clustering
4/30/2020 genes © Bud Mishra, 2002 Lec.5-26
Hierarchical Clustering
• P = set of genes • While more than one subtree in P – Pick the most similar pair i , j in P – Define a new subtree k – Remove i and j from P joining i and j and insert k 4/30/2020 © Bud Mishra, 2002 Lec.5-27
Gene Similarity Metric
• Similarity between two genes: X and Y S(X, Y) = (1/N) i=1 N where (X i – X offset / F X ) (Y i – Y offset / F Y ) F G = [ i=1 N (G i – G offset ) 2 /N] 1/2 4/30/2020 © Bud Mishra, 2002 Lec.5-28
Gene Similarity Metric
• Since there is an assumed reference state (the gene’s expression level did not change), G offset is set to 0 for all genes • S(X, Y) = (1/N) i=1 N [X i / { i=1 N = (1/N) { i=1 N X i Y i X i 2 /N} 1/2 ] [Y i / { i=1 N }/{SD(X) SD(Y)} = (1/N) {X ¢ Y}/{SD(X) SD(Y)} Y i 2 /N} 1/2 ] 4/30/2020 © Bud Mishra, 2002 Lec.5-29
Results
• Redundant representations of genes cluster together.
– But individual genes can be distinguished from related genes by subtle differences in expression.
• Genes of similar function cluster together.
– E.g., 126 genes were found strongly down-regulated in response to stress.
• 112 of these genes encode ribosomal and other proteins related to translation • The result agrees with previously known result that yeast responds to favorable growth conditions by increasing the production of ribosomes.
4/30/2020 © Bud Mishra, 2002 Lec.5-30
Comments on Gene Clustering
• Descriptive approach to analyzing data • This approach can be potentially used to – Gain insight into a gene’s function.
– Identify classes of genes.
4/30/2020 © Bud Mishra, 2002 Lec.5-31
Large Margin Classifier
4/30/2020 © Bud Mishra, 2002 Lec.5-32
Support Vector Machine
• Classification Microarray Expression Data • Brown, Grundy, Lin, Cristianini, Sugnet, Ares & Haussler ’99 • Analysis of (Stanford) S. cerevisiae data from Pat Brown’s Lab – Instead of clustering genes to see what groupings emerge – Devise models to match genes to predefined classes 4/30/2020 © Bud Mishra, 2002 Lec.5-33
The Classes
• From the MIPS yeast genome database (MYGD) – Tricarboxylic acid pathway (Krebs cycle) – Respiration chain complexes – Cytoplasmic ribosomal proteins – Proteasome – Histones – Helix-turn-helix (control) • Classes come from biochemical/genetic studies of genes 4/30/2020 © Bud Mishra, 2002 Lec.5-34
Gene Classification
• Learning Task – Given : Expression profiles of genes and their class tables – Do : Learn models distinguishing genes of each class from genes in other classes • Classification Task – Given : Expression profile of a gene whose class is not unknown – Do : Predict the class to which this gene belongs 4/30/2020 © Bud Mishra, 2002 Lec.5-35
The Approach
• Brown et al. applies a variety of algorithms to this task: – Support vector machines (SVMs) [Vapnik ’95] – Decision Trees – Parzen Windows – Fisher linear discriminant 4/30/2020 © Bud Mishra, 2002 Lec.5-36
Support Vector Machines
• Consider the genes in our example as m points in an n dimensional space ( m genes, n experiments) 4/30/2020 Experiment 1 © Bud Mishra, 2002 Lec.5-37
Support Vector Machines
4/30/2020 Experiment 1 © Bud Mishra, 2002 • Leaning in SVMs involves finding a hyperplane ( decision surface ) that separates the examples of one class from another.
Lec.5-38
Support Vector Machines
• For the i th example, let x i be the vector of expression measurements, and y i be +1, if the example is in the class of interest; and –1, otherwise • The hyperplane is given by: w ¢ x + b = 0 where b = constant and w= vector of weights 4/30/2020 © Bud Mishra, 2002 Lec.5-39
Support Vector Machines
• The function used to classify examples is then where y P y P = sgn(w ¢ = predicted value of y.
x + b) 4/30/2020 © Bud Mishra, 2002 Lec.5-40
Support Vector Machines
• There may be many such hyperplanes..
• Which one should we choose?
4/30/2020 Experiment 1 © Bud Mishra, 2002 Lec.5-41
Maximizing the Margin
4/30/2020 Experiment 1 • Key SVM idea – Pick the hyperplane that maximizes the margin—the distance to the hyperplane from the closest point – Motivation: Obtain tightest possible bounds on the error rate of the classifier.
© Bud Mishra, 2002 Lec.5-42
SVM: Finding the Hyperplane
• Can be formulated as an optimization task – Minimize i=1 n w i 2 – Subject to 8 i: y i [w ¢ x + b] ¸ 1 4/30/2020 © Bud Mishra, 2002 Lec.5-43
Learning Algorithm for Separable Problems
• Vapnik & Lerner ’63; Vapnik & Chervonenkis ’64 • Class of hyperplanes: – w ¢ x + b =0, w 2 R n , b 2 R • Decision Function: – f(x) = sgn ( w – max w,b ¢ between classes x + b) • Construct f from empirical data (“Generalized Portrait”) – Among all hyperplanes separating the data, there exists a unique one yielding maximum margin of sepration min { |x –x | i 2 : x 2 R n , w ¢ x +b = 0, i=1,..n} 4/30/2020 © Bud Mishra, 2002 Lec.5-44
Maximum Margin
x 2 {x | w ¢ 1} x + b = w x 1 {x | w ¢ +1} x + b = {x | w ¢ x + b = 0} w ¢ x 1 + b = +1 w ¢ x 2 + b = -1 w ¢ (x 1 – x 2 ) = 2 ) (x 1 - x 2 ) 1 w = 2 /|w| 2 4/30/2020 © Bud Mishra, 2002 Lec.5-45
Construction of Optimal Hyperplane
• Margin = (x 1 – x 2 ) ¢ 1 w = 2/|w| 2 • Optimization Problem: Maximize margin (=2/|w| 2 ) with a hyperplane separating the classes: – Minimize i=1 m – Subject to w i 2 8 i 2 {1,..n} y i (w ¢ x i + b) ¸ 1.
4/30/2020 © Bud Mishra, 2002 Lec.5-46
Optimization Problem
• Using Lagrange Multipliers – a i ¸ 0, i – L(w,b, a 2 {1,.., m} ) = (1/2) w variables w and b T w i=1 m a i [ y i (x i ¢ w + b) –1] – Minimize the Lagrangian L with respect to the primal – Maximize the Lagrangian L with respect to the dual variables a i .
• Saddle Point… 4/30/2020 © Bud Mishra, 2002 Lec.5-47
Intuition
• If a saddle point is violated, then – y i (w ¢ x i + b) – 1 < 0 – L can be increased by increasing the corresponding a i – w and b have to change such that L decreases • To prevent “ a i [y i (w ¢ x i + b) – 1]” from becoming arbitrarily large, the change in w and b will ensure that eventually the constraint is satisfied.
– Assuming that the problem is separable • Karush-Kuhn-Tucker Complementarity Condition – For all constraints which are not satisfied precisely as equalities, I.e. y i (w ¢ x i + b) – 1 > 0, the corresponding a I ´ 0.
4/30/2020 © Bud Mishra, 2002 Lec.5-48
Duality
• At saddle point, the derivatives with respect to the primal variables must vanish: – ( / – ( / b) L(w,b, a ) = 0.
)
w) L(w,b, a ) = 0.
)
i=1 m w a i y i i=1 m = 0 a i y i x i = 0 – w = i=1 m a i y i x i • By the Karush-Kuhn-Tucker complementarity: – a i [ y i (x i ¢ w + b) – 1] = 0, 8 • Those patterns whose a i 0 i 2 ) {1,.., m} “Support Vectors” 4/30/2020 © Bud Mishra, 2002 Lec.5-49
Lagrangian
• L(w,b, a ) = w T w/2 i=1 m = (1/2) i,j = 1 m a i a i [ y i a j • Maximize the Dual: – W( a ) = i=1 m a i (x i y I y j ¢ – (1/2) i,j=1 m w + b) –1] (x i a i ¢ a j x y i j ) y j (x i ¢ x j ) 4/30/2020 © Bud Mishra, 2002 Lec.5-50
Wolfe Dual Optimization Problem
• Maximize W( a ) = i=1 m • Subject to a i – (1/2) i,j=1 m a i a j y a i ¸ 0, i = 1, …, m and i=1 m a i y i = 0 i y j (x i ¢ x j ) 4/30/2020 © Bud Mishra, 2002 Lec.5-51
Decision Function
• The hyperplane decision function can be written as f(x) = sgn( i=1 m y i a i (x • Where b is the solution to ¢ a i [ y i (x i ¢ w + b) – 1] = 0 x i ) + b] 4/30/2020 © Bud Mishra, 2002 Lec.5-52
Dealing with Data Not Separable by a Hyperplane
• Map the data into some other dot product space (called the “Feature Space”) F via a nonlinear map: – F : R N !
F • Kernel Function: k(x, y) = F (x) ¢ F (y) • Examples: – Sigmoid Kernel = k(x,y) = tanh ( k (x ¢ • k = gain and Q = Threshold y) + Q ) – Radial Basis Kernel = k(x,y) = exp{-|x-y| 2 /2 s 2 } 4/30/2020 © Bud Mishra, 2002 Lec.5-53
Dealing with Data Not Separable by a Hyperplane
• Find the maximum margin hyperplane in the feature space: • y P = sgn[w ¢ F (x) + b] =sgn[ Ii=1 m =sgn[ i=1 m • Optimization Problem: – Maximize: W( a ) = i=1 m x j ) – Subject to: a i i=1 m ¸ a i y y i i y i a a I i ( = 0 F (x) k(x, x a i i – (1/2) i,j =1 m 0, i =1, …, m ¢ F (x ) + b] i ) + b] a i a j y i y j k(x i , 4/30/2020 © Bud Mishra, 2002 Lec.5-54
Dealing with the Noise in Data
• One can relax the requirement that the hyperplane strictly separates the data.
• A soft margin allows some misclassified training examples: – Introduce m slack variables x i ¸ 0, 8 i 2 {1,…,m} – Minimize the objective function: t (w, x ) = (1/2) |w| 2 + C i =1 m x i (C > 0) • Dual Optimization Problem: – Maximize: W( a ) = i=1 m – Subject to: 0 5 a i 5 a i – (1/2) C , i =1, …, m i,j =1 m i=1 m a i y i = 0 a i a j y i y j k(x i , x j ) 4/30/2020 © Bud Mishra, 2002 Lec.5-55
SVM & Neural Networks
• SVM – Represents linear or nonlinear separating surface – Weights determined by optimization method (optimizing margins) • Neural Network – Represents linear or nonlinear separating surface – Weights determined by optimization method (optimizing sum of squared error—or a related objective function) 4/30/2020 © Bud Mishra, 2002 Lec.5-56
Experiments
• 3-fold cross validation • Create a separate model for each class • SVM with various kernel functions – Dot product raised to power d= 1,2,3: k(x,y) = (x ¢ y) d – Gaussian • Various Other Classification Methods – Decision trees – Parzen windows – Fisher linear discriminant 4/30/2020 © Bud Mishra, 2002 Lec.5-57
Class Krebs cycle Respiration Ribosome Proteasome Histone Helix-turn-helix 4/30/2020
SVM Results
9 9 3 FP 8 0 1 6 4 7 FN 8 2 16 © Bud Mishra, 2002 TP 9 24 117 28 9 0 TN 2442 2428 2337 2429 2456 2450 Lec.5-58
SVM Results
• SVM had highest accuracy for all classes (except the control) • Many of the false positives could be easily explained in terms of the underlying biology: – E.g. YAL003W was repeatedly assigned to the ribosome class • Not a ribosomal protein • But known to be required for proper functioning of the ribosome.
4/30/2020 © Bud Mishra, 2002 Lec.5-59