Udine Lectures Lecture #5: Microarray Analysis of Networks ¦ Bud Mishra Professor of Computer Science and Mathematics (Courant, NYU) Professor (Watson School, CSHL) 7 ¦ 9 ¦

Download Report

Transcript Udine Lectures Lecture #5: Microarray Analysis of Networks ¦ Bud Mishra Professor of Computer Science and Mathematics (Courant, NYU) Professor (Watson School, CSHL) 7 ¦ 9 ¦

Udine Lectures Lecture #5:

Microarray Analysis of Networks

¦ Bud Mishra Professor of Computer Science and Mathematics (Courant, NYU) Professor (Watson School, CSHL) 7

¦

9

¦

2002 4/30/2020 © Bud Mishra, 2002 Lec.5-1

Gene Expression

4/30/2020 © Bud Mishra, 2002 Lec.5-2

Transcriptional State of a Cell

• Transcriptional state of a cell can be characterized by detecting and quantitating gene expression levels: – northern blots – S1 nuclease protection – differential display – sequencing of cDNA libraries – serial analysis of gene expression (cDNA) – Array based technologies : • spotted arrays • oligonucleaotide arrays 4/30/2020 © Bud Mishra, 2002 Lec.5-3

Gene Expression Data

• Microarrays enable one to simultaneously measure the activity of up to 30,000 ( » 10 4 —10 5 ) genes.

• In particular, the amount of mRNA for each gene in a given sample (or a pair of samples) can be measured.

4/30/2020 © Bud Mishra, 2002 Lec.5-4

Gene Expression Data

• Microarrays provide a tool for answering a wide variety of questions: – In which cells is each gene active?

– Under what environmental conditions is each gene active?

– How does the activity level of a gene change under different conditions?

• Stage of a cell cycle?

• Environmental conditions?

• Diseases?

4/30/2020 © Bud Mishra, 2002 Lec.5-5

Gene Expression Data

• Functional genomics with microarrays: – What genes can be inferred to be regulated together?

– What happens to the expression level of every gene when a (candidate) gene is mutated?

– What can be inferred about the regulatory structure?

4/30/2020 © Bud Mishra, 2002 Lec.5-6

The Computational Tasks

• Clustering Genes: – Which genes are regulated together?

• Classifying Genes: – Which functional class does a particular gene fall into?

• Classifying Gene Expressions: – What can be learnt about a cell from the set of all mRNA expressed in a cell? Classifying diseases: Does a patient have ALL or AML (classes of Leukemia)?

• Inferring Regulatory Networks: – What is the “circuitry” of the cell?

4/30/2020 © Bud Mishra, 2002 Lec.5-7

4/30/2020

Microarrays

© Bud Mishra, 2002 Lec.5-8

Microarrays

• Two general types currently popular… – Spotted Arrays (Pat Brown, Stanford) – Oligonucleotide Arrays (Affymetrix) – Other variations (Agilent, Incyte, NGS, …) • The key idea is to query a genome for a particular pattern by complementary hybridization.

4/30/2020 © Bud Mishra, 2002 Lec.5-9

Complementary Hybridization

AGCGTTCGAATACC UCGCAAGCUUAUGG ATCGGTACGTTAACG mRNA only hybridizes here • Due to Watson Crick base pairing, an mRNA molecule will hybridize to a complementary DNA molecule.

CCGAAAATAGCCAG 4/30/2020 © Bud Mishra, 2002 Lec.5-10

Complementary Hybridization

AGCGTTCGAATACC TCGCAAGCTTATGG mRNA, first converted to cDNA cDNA hybridizes to a gene • Practical implementation of this idea: Reverse Transcriptase UCGCAAGCUUAUGG – Put the actual gene sequence on array – Convert mRNA to cDNA (copy DNA) using reverse transcriptase – Hybridize cDNA to the sequence on the array 4/30/2020 © Bud Mishra, 2002 Lec.5-11

4/30/2020 cDNA array

Spotted Array

• Robots array microscopic sized spots of DNA on glass slides – Each spot is DNA analog (cDNA) of one of the mRNA’s we wish to measure… © Bud Mishra, 2002 Lec.5-12

Reference Test 4/30/2020

Spotted Arrays

cDNA • Two samples (reference and test) of mRNA are converted to cDNA, labeled with fluorochrome dyes and allowed to hybridize to the array.

© Bud Mishra, 2002 Lec.5-13

Spotted Arrays

• Lasers applied to the arrays yield an emission for each fluorescent dye.

4/30/2020 © Bud Mishra, 2002 Lec.5-14

Oligonucleotde Arrays

• “Gene Chips” – Instead of putting entire genes on array, put sets of DNA 25-mers (synthesized oligonucleotides) – Produced using a photolithography process similar to the ones used to create semiconductor chips – mRNA samples are processed separately instead of in pairs (of reference/control and test) 4/30/2020 © Bud Mishra, 2002 Lec.5-15

Oligonucleotide Arrays

gene 25-mers • Given a gene to be queried/measured, select a large number ( » 20) 25-mers for that gene.

• Selection criteria – Specificity – Hybridization properties – Ease of manufacturing 4/30/2020 © Bud Mishra, 2002 Lec.5-16

Oligonucleotide Arrays

• Each of these probes is put on the chip • Additionally a slight variant (that differs only at the 13 th base) of each oligo is put next to it.

– This helps factor out false hybridization (pm[perfect match] vs. mm[mismatch]) • The measurement for a gene is derived from these 40 separate measurements.

4/30/2020 © Bud Mishra, 2002 Lec.5-17

Cluster Analysis

4/30/2020 © Bud Mishra, 2002 Lec.5-18

Genome-wide Cluster Analysis

• Eisen et al. PNAS 1998 • Put all genes ( » 6200) of single microarray S. cerevisae (yeast) on a • Measure experiment across experiments m independent • Group together genes that have similar expression profiles.

4/30/2020 © Bud Mishra, 2002 Lec.5-19

Genome-wide Cluster Analysis

• Each measurement G i represents log (red i /green i ) Where red is the test expression level and green is the reference expression level for gene G in the i th experiment.

• The expression profile of a gene is the vector of measurements across all experiments: h G 1 , …, G m i .

4/30/2020 © Bud Mishra, 2002 Lec.5-20

The Data

• 79 measurements for each of 2467 genes • Data collected at various times during – Diauxic shift (shutting down genes for metabolizing sugar, activating genes for metabolizing ethanol) – Mitotic cell division cycle – Sporulation – Temperature shock – Reducing Shock 4/30/2020 © Bud Mishra, 2002 Lec.5-21

The Data

• n genes measured in m experiments: Vector for a gene G 1,1 G 2,1 M G m,1 L L G 1,n G 2,n O M L G m,n 4/30/2020 © Bud Mishra, 2002 Lec.5-22

The Task

• Given – Expression profiles for a set of genes.

• Compute – An organization of genes into clusters such that genes within a cluster have similar profiles.

4/30/2020 © Bud Mishra, 2002 Lec.5-23

The Task

log (red i /green i ) 4/30/2020 Experiments © Bud Mishra, 2002 Lec.5-24

Approaches

• Eisen et al.: Hierarchical clustering.

• Other clustering methods have been applied to this gene expression data: – EM with Gaussian Clusters [Mjolsness et al. ’99] – Self Organizing Maps [Tamayo et al. ’99] – Graph Theory Algorithms [Ben-Dor & Yakhini ’98, Hartuv et al. ’99] 4/30/2020 © Bud Mishra, 2002 Lec.5-25

Degrees of dissimilarity

Hierarchical Clustering

4/30/2020 genes © Bud Mishra, 2002 Lec.5-26

Hierarchical Clustering

• P = set of genes • While more than one subtree in P – Pick the most similar pair i , j in P – Define a new subtree k – Remove i and j from P joining i and j and insert k 4/30/2020 © Bud Mishra, 2002 Lec.5-27

Gene Similarity Metric

• Similarity between two genes: X and Y S(X, Y) = (1/N)  i=1 N where (X i – X offset / F X ) (Y i – Y offset / F Y ) F G = [  i=1 N (G i – G offset ) 2 /N] 1/2 4/30/2020 © Bud Mishra, 2002 Lec.5-28

Gene Similarity Metric

• Since there is an assumed reference state (the gene’s expression level did not change), G offset is set to 0 for all genes • S(X, Y) = (1/N)  i=1 N [X i / {  i=1 N = (1/N) {  i=1 N X i Y i X i 2 /N} 1/2 ] [Y i / {  i=1 N }/{SD(X) SD(Y)} = (1/N) {X ¢ Y}/{SD(X) SD(Y)} Y i 2 /N} 1/2 ] 4/30/2020 © Bud Mishra, 2002 Lec.5-29

Results

• Redundant representations of genes cluster together.

– But individual genes can be distinguished from related genes by subtle differences in expression.

• Genes of similar function cluster together.

– E.g., 126 genes were found strongly down-regulated in response to stress.

• 112 of these genes encode ribosomal and other proteins related to translation • The result agrees with previously known result that yeast responds to favorable growth conditions by increasing the production of ribosomes.

4/30/2020 © Bud Mishra, 2002 Lec.5-30

Comments on Gene Clustering

• Descriptive approach to analyzing data • This approach can be potentially used to – Gain insight into a gene’s function.

– Identify classes of genes.

4/30/2020 © Bud Mishra, 2002 Lec.5-31

Large Margin Classifier

4/30/2020 © Bud Mishra, 2002 Lec.5-32

Support Vector Machine

• Classification Microarray Expression Data • Brown, Grundy, Lin, Cristianini, Sugnet, Ares & Haussler ’99 • Analysis of (Stanford) S. cerevisiae data from Pat Brown’s Lab – Instead of clustering genes to see what groupings emerge – Devise models to match genes to predefined classes 4/30/2020 © Bud Mishra, 2002 Lec.5-33

The Classes

• From the MIPS yeast genome database (MYGD) – Tricarboxylic acid pathway (Krebs cycle) – Respiration chain complexes – Cytoplasmic ribosomal proteins – Proteasome – Histones – Helix-turn-helix (control) • Classes come from biochemical/genetic studies of genes 4/30/2020 © Bud Mishra, 2002 Lec.5-34

Gene Classification

• Learning Task – Given : Expression profiles of genes and their class tables – Do : Learn models distinguishing genes of each class from genes in other classes • Classification Task – Given : Expression profile of a gene whose class is not unknown – Do : Predict the class to which this gene belongs 4/30/2020 © Bud Mishra, 2002 Lec.5-35

The Approach

• Brown et al. applies a variety of algorithms to this task: – Support vector machines (SVMs) [Vapnik ’95] – Decision Trees – Parzen Windows – Fisher linear discriminant 4/30/2020 © Bud Mishra, 2002 Lec.5-36

Support Vector Machines

• Consider the genes in our example as m points in an n dimensional space ( m genes, n experiments) 4/30/2020 Experiment 1 © Bud Mishra, 2002 Lec.5-37

Support Vector Machines

4/30/2020 Experiment 1 © Bud Mishra, 2002 • Leaning in SVMs involves finding a hyperplane ( decision surface ) that separates the examples of one class from another.

Lec.5-38

Support Vector Machines

• For the i th example, let x i be the vector of expression measurements, and y i be +1, if the example is in the class of interest; and –1, otherwise • The hyperplane is given by: w ¢ x + b = 0 where b = constant and w= vector of weights 4/30/2020 © Bud Mishra, 2002 Lec.5-39

Support Vector Machines

• The function used to classify examples is then where y P y P = sgn(w ¢ = predicted value of y.

x + b) 4/30/2020 © Bud Mishra, 2002 Lec.5-40

Support Vector Machines

• There may be many such hyperplanes..

• Which one should we choose?

4/30/2020 Experiment 1 © Bud Mishra, 2002 Lec.5-41

Maximizing the Margin

4/30/2020 Experiment 1 • Key SVM idea – Pick the hyperplane that maximizes the margin—the distance to the hyperplane from the closest point – Motivation: Obtain tightest possible bounds on the error rate of the classifier.

© Bud Mishra, 2002 Lec.5-42

SVM: Finding the Hyperplane

• Can be formulated as an optimization task – Minimize  i=1 n w i 2 – Subject to 8 i: y i [w ¢ x + b] ¸ 1 4/30/2020 © Bud Mishra, 2002 Lec.5-43

Learning Algorithm for Separable Problems

• Vapnik & Lerner ’63; Vapnik & Chervonenkis ’64 • Class of hyperplanes: – w ¢ x + b =0, w 2 R n , b 2 R • Decision Function: – f(x) = sgn ( w – max w,b ¢ between classes x + b) • Construct f from empirical data (“Generalized Portrait”) – Among all hyperplanes separating the data, there exists a unique one yielding maximum margin of sepration min { |x –x | i 2 : x 2 R n , w ¢ x +b = 0, i=1,..n} 4/30/2020 © Bud Mishra, 2002 Lec.5-44

Maximum Margin

x 2 {x | w ¢ 1} x + b = w x 1 {x | w ¢ +1} x + b = {x | w ¢ x + b = 0} w ¢ x 1 + b = +1 w ¢ x 2 + b = -1 w ¢ (x 1 – x 2 ) = 2 ) (x 1 - x 2 ) 1 w = 2 /|w| 2 4/30/2020 © Bud Mishra, 2002 Lec.5-45

Construction of Optimal Hyperplane

• Margin = (x 1 – x 2 ) ¢ 1 w = 2/|w| 2 • Optimization Problem: Maximize margin (=2/|w| 2 ) with a hyperplane separating the classes: – Minimize  i=1 m – Subject to w i 2 8 i 2 {1,..n} y i (w ¢ x i + b) ¸ 1.

4/30/2020 © Bud Mishra, 2002 Lec.5-46

Optimization Problem

• Using Lagrange Multipliers – a i ¸ 0, i – L(w,b, a 2 {1,.., m} ) = (1/2) w variables w and b T w  i=1 m a i [ y i (x i ¢ w + b) –1] – Minimize the Lagrangian L with respect to the primal – Maximize the Lagrangian L with respect to the dual variables a i .

• Saddle Point… 4/30/2020 © Bud Mishra, 2002 Lec.5-47

Intuition

• If a saddle point is violated, then – y i (w ¢ x i + b) – 1 < 0 – L can be increased by increasing the corresponding a i – w and b have to change such that L decreases • To prevent “ a i [y i (w ¢ x i + b) – 1]” from becoming arbitrarily large, the change in w and b will ensure that eventually the constraint is satisfied.

– Assuming that the problem is separable • Karush-Kuhn-Tucker Complementarity Condition – For all constraints which are not satisfied precisely as equalities, I.e. y i (w ¢ x i + b) – 1 > 0, the corresponding a I ´ 0.

4/30/2020 © Bud Mishra, 2002 Lec.5-48

Duality

• At saddle point, the derivatives with respect to the primal variables must vanish: – (  /  – (  /  b) L(w,b, a ) = 0.

)

w) L(w,b, a ) = 0.

)

 i=1 m w a i y i  i=1 m = 0 a i y i x i = 0 – w =  i=1 m a i y i x i • By the Karush-Kuhn-Tucker complementarity: – a i [ y i (x i ¢ w + b) – 1] = 0, 8 • Those patterns whose a i  0 i 2 ) {1,.., m} “Support Vectors” 4/30/2020 © Bud Mishra, 2002 Lec.5-49

Lagrangian

• L(w,b, a ) = w T w/2  i=1 m = (1/2)  i,j = 1 m a i a i [ y i a j • Maximize the Dual: – W( a ) =  i=1 m a i (x i y I y j ¢ – (1/2)  i,j=1 m w + b) –1] (x i a i ¢ a j x y i j ) y j (x i ¢ x j ) 4/30/2020 © Bud Mishra, 2002 Lec.5-50

Wolfe Dual Optimization Problem

• Maximize W( a ) =  i=1 m • Subject to a i – (1/2)  i,j=1 m a i a j y a i ¸ 0, i = 1, …, m and  i=1 m a i y i = 0 i y j (x i ¢ x j ) 4/30/2020 © Bud Mishra, 2002 Lec.5-51

Decision Function

• The hyperplane decision function can be written as f(x) = sgn(  i=1 m y i a i (x • Where b is the solution to ¢ a i [ y i (x i ¢ w + b) – 1] = 0 x i ) + b] 4/30/2020 © Bud Mishra, 2002 Lec.5-52

Dealing with Data Not Separable by a Hyperplane

• Map the data into some other dot product space (called the “Feature Space”) F via a nonlinear map: – F : R N !

F • Kernel Function: k(x, y) = F (x) ¢ F (y) • Examples: – Sigmoid Kernel = k(x,y) = tanh ( k (x ¢ • k = gain and Q = Threshold y) + Q ) – Radial Basis Kernel = k(x,y) = exp{-|x-y| 2 /2 s 2 } 4/30/2020 © Bud Mishra, 2002 Lec.5-53

Dealing with Data Not Separable by a Hyperplane

• Find the maximum margin hyperplane in the feature space: • y P = sgn[w ¢ F (x) + b] =sgn[  Ii=1 m =sgn[  i=1 m • Optimization Problem: – Maximize: W( a ) =  i=1 m x j ) – Subject to: a i  i=1 m ¸ a i y y i i y i a a I i ( = 0 F (x) k(x, x a i i – (1/2)  i,j =1 m 0, i =1, …, m ¢ F (x ) + b] i ) + b] a i a j y i y j k(x i , 4/30/2020 © Bud Mishra, 2002 Lec.5-54

Dealing with the Noise in Data

• One can relax the requirement that the hyperplane strictly separates the data.

• A soft margin allows some misclassified training examples: – Introduce m slack variables x i ¸ 0, 8 i 2 {1,…,m} – Minimize the objective function: t (w, x ) = (1/2) |w| 2 + C  i =1 m x i (C > 0) • Dual Optimization Problem: – Maximize: W( a ) =  i=1 m – Subject to: 0 5 a i 5 a i – (1/2) C , i =1, …, m  i,j =1 m  i=1 m a i y i = 0 a i a j y i y j k(x i , x j ) 4/30/2020 © Bud Mishra, 2002 Lec.5-55

SVM & Neural Networks

• SVM – Represents linear or nonlinear separating surface – Weights determined by optimization method (optimizing margins) • Neural Network – Represents linear or nonlinear separating surface – Weights determined by optimization method (optimizing sum of squared error—or a related objective function) 4/30/2020 © Bud Mishra, 2002 Lec.5-56

Experiments

• 3-fold cross validation • Create a separate model for each class • SVM with various kernel functions – Dot product raised to power d= 1,2,3: k(x,y) = (x ¢ y) d – Gaussian • Various Other Classification Methods – Decision trees – Parzen windows – Fisher linear discriminant 4/30/2020 © Bud Mishra, 2002 Lec.5-57

Class Krebs cycle Respiration Ribosome Proteasome Histone Helix-turn-helix 4/30/2020

SVM Results

9 9 3 FP 8 0 1 6 4 7 FN 8 2 16 © Bud Mishra, 2002 TP 9 24 117 28 9 0 TN 2442 2428 2337 2429 2456 2450 Lec.5-58

SVM Results

• SVM had highest accuracy for all classes (except the control) • Many of the false positives could be easily explained in terms of the underlying biology: – E.g. YAL003W was repeatedly assigned to the ribosome class • Not a ribosomal protein • But known to be required for proper functioning of the ribosome.

4/30/2020 © Bud Mishra, 2002 Lec.5-59