Udine Lectures Lecture #5: Microarray Analysis of Networks ¦ Bud Mishra Professor of Computer Science and Mathematics (Courant, NYU) Professor (Watson School, CSHL) 7 ¦ 9 ¦

Download Report

Transcript Udine Lectures Lecture #5: Microarray Analysis of Networks ¦ Bud Mishra Professor of Computer Science and Mathematics (Courant, NYU) Professor (Watson School, CSHL) 7 ¦ 9 ¦

Udine Lectures Lecture #5:

Microarray Analysis of Networks

¦ Bud Mishra Professor of Computer Science and Mathematics (Courant, NYU) Professor (Watson School, CSHL) 7

2002 4/30/2020 © Bud Mishra, 2002 Lec.5-1

Gene Expression

4/30/2020 © Bud Mishra, 2002 Lec.5-2

Transcriptional State of a Cell

• Transcriptional state of a cell can be characterized by detecting and quantitating gene expression levels: – northern blots – S1 nuclease protection – differential display – sequencing of cDNA libraries – serial analysis of gene expression (cDNA) – Array based technologies : • spotted arrays • oligonucleaotide arrays 4/30/2020 © Bud Mishra, 2002 Lec.5-3

Gene Expression Data

• Microarrays enable one to simultaneously measure the activity of up to 30,000 ( » 10 4 —10 5 ) genes.

• In particular, the amount of mRNA for each gene in a given sample (or a pair of samples) can be measured.

4/30/2020 © Bud Mishra, 2002 Lec.5-4

Gene Expression Data

• Microarrays provide a tool for answering a wide variety of questions: – In which cells is each gene active?

– Under what environmental conditions is each gene active?

– How does the activity level of a gene change under different conditions?

• Stage of a cell cycle?

• Environmental conditions?

• Diseases?

4/30/2020 © Bud Mishra, 2002 Lec.5-5

Gene Expression Data

• Functional genomics with microarrays: – What genes can be inferred to be regulated together?

– What happens to the expression level of every gene when a (candidate) gene is mutated?

– What can be inferred about the regulatory structure?

4/30/2020 © Bud Mishra, 2002 Lec.5-6

The Computational Tasks

• Clustering Genes: – Which genes are regulated together?

• Classifying Genes: – Which functional class does a particular gene fall into?

• Classifying Gene Expressions: – What can be learnt about a cell from the set of all mRNA expressed in a cell? Classifying diseases: Does a patient have ALL or AML (classes of Leukemia)?

• Inferring Regulatory Networks: – What is the “circuitry” of the cell?

4/30/2020 © Bud Mishra, 2002 Lec.5-7

4/30/2020

Microarrays

• Two general types currently popular… – Spotted Arrays (Pat Brown, Stanford) – Oligonucleotide Arrays (Affymetrix) – Other variations (Agilent, Incyte, NGS, …) • The key idea is to query a genome for a particular pattern by complementary hybridization.

4/30/2020 © Bud Mishra, 2002 Lec.5-9

Complementary Hybridization

AGCGTTCGAATACC UCGCAAGCUUAUGG ATCGGTACGTTAACG mRNA only hybridizes here • Due to Watson Crick base pairing, an mRNA molecule will hybridize to a complementary DNA molecule.

CCGAAAATAGCCAG 4/30/2020 © Bud Mishra, 2002 Lec.5-10

Complementary Hybridization

AGCGTTCGAATACC TCGCAAGCTTATGG mRNA, first converted to cDNA cDNA hybridizes to a gene • Practical implementation of this idea: Reverse Transcriptase UCGCAAGCUUAUGG – Put the actual gene sequence on array – Convert mRNA to cDNA (copy DNA) using reverse transcriptase – Hybridize cDNA to the sequence on the array 4/30/2020 © Bud Mishra, 2002 Lec.5-11

4/30/2020 cDNA array

Spotted Array

• Robots array microscopic sized spots of DNA on glass slides – Each spot is DNA analog (cDNA) of one of the mRNA’s we wish to measure… © Bud Mishra, 2002 Lec.5-12

Reference Test 4/30/2020

Spotted Arrays

cDNA • Two samples (reference and test) of mRNA are converted to cDNA, labeled with fluorochrome dyes and allowed to hybridize to the array.

Spotted Arrays

• Lasers applied to the arrays yield an emission for each fluorescent dye.

4/30/2020 © Bud Mishra, 2002 Lec.5-14

Oligonucleotde Arrays

• “Gene Chips” – Instead of putting entire genes on array, put sets of DNA 25-mers (synthesized oligonucleotides) – Produced using a photolithography process similar to the ones used to create semiconductor chips – mRNA samples are processed separately instead of in pairs (of reference/control and test) 4/30/2020 © Bud Mishra, 2002 Lec.5-15

Oligonucleotide Arrays

gene 25-mers • Given a gene to be queried/measured, select a large number ( » 20) 25-mers for that gene.

• Selection criteria – Specificity – Hybridization properties – Ease of manufacturing 4/30/2020 © Bud Mishra, 2002 Lec.5-16

Oligonucleotide Arrays

• Each of these probes is put on the chip • Additionally a slight variant (that differs only at the 13 th base) of each oligo is put next to it.

– This helps factor out false hybridization (pm[perfect match] vs. mm[mismatch]) • The measurement for a gene is derived from these 40 separate measurements.

4/30/2020 © Bud Mishra, 2002 Lec.5-17

Cluster Analysis

4/30/2020 © Bud Mishra, 2002 Lec.5-18

Genome-wide Cluster Analysis

• Eisen et al. PNAS 1998 • Put all genes ( » 6200) of single microarray S. cerevisae (yeast) on a • Measure experiment across experiments m independent • Group together genes that have similar expression profiles.

4/30/2020 © Bud Mishra, 2002 Lec.5-19

Genome-wide Cluster Analysis

• Each measurement G i represents log (red i /green i ) Where red is the test expression level and green is the reference expression level for gene G in the i th experiment.

• The expression profile of a gene is the vector of measurements across all experiments: h G 1 , …, G m i .

4/30/2020 © Bud Mishra, 2002 Lec.5-20

The Data

• 79 measurements for each of 2467 genes • Data collected at various times during – Diauxic shift (shutting down genes for metabolizing sugar, activating genes for metabolizing ethanol) – Mitotic cell division cycle – Sporulation – Temperature shock – Reducing Shock 4/30/2020 © Bud Mishra, 2002 Lec.5-21

The Data

• n genes measured in m experiments: Vector for a gene G 1,1 G 2,1 M G m,1 L L G 1,n G 2,n O M L G m,n 4/30/2020 © Bud Mishra, 2002 Lec.5-22

The Task

• Given – Expression profiles for a set of genes.

• Compute – An organization of genes into clusters such that genes within a cluster have similar profiles.

4/30/2020 © Bud Mishra, 2002 Lec.5-23

The Task

log (red i /green i ) 4/30/2020 Experiments © Bud Mishra, 2002 Lec.5-24

Approaches

• Eisen et al.: Hierarchical clustering.

• Other clustering methods have been applied to this gene expression data: – EM with Gaussian Clusters [Mjolsness et al. ’99] – Self Organizing Maps [Tamayo et al. ’99] – Graph Theory Algorithms [Ben-Dor & Yakhini ’98, Hartuv et al. ’99] 4/30/2020 © Bud Mishra, 2002 Lec.5-25

Degrees of dissimilarity

Hierarchical Clustering

4/30/2020 genes © Bud Mishra, 2002 Lec.5-26

Hierarchical Clustering

• P = set of genes • While more than one subtree in P – Pick the most similar pair i , j in P – Define a new subtree k – Remove i and j from P joining i and j and insert k 4/30/2020 © Bud Mishra, 2002 Lec.5-27

Gene Similarity Metric

• Similarity between two genes: X and Y S(X, Y) = (1/N)  i=1 N where (X i – X offset / F X ) (Y i – Y offset / F Y ) F G = [  i=1 N (G i – G offset ) 2 /N] 1/2 4/30/2020 © Bud Mishra, 2002 Lec.5-28

Gene Similarity Metric

• Since there is an assumed reference state (the gene’s expression level did not change), G offset is set to 0 for all genes • S(X, Y) = (1/N)  i=1 N [X i / {  i=1 N = (1/N) {  i=1 N X i Y i X i 2 /N} 1/2 ] [Y i / {  i=1 N }/{SD(X) SD(Y)} = (1/N) {X ¢ Y}/{SD(X) SD(Y)} Y i 2 /N} 1/2 ] 4/30/2020 © Bud Mishra, 2002 Lec.5-29

Results

• Redundant representations of genes cluster together.

– But individual genes can be distinguished from related genes by subtle differences in expression.

• Genes of similar function cluster together.

– E.g., 126 genes were found strongly down-regulated in response to stress.

• 112 of these genes encode ribosomal and other proteins related to translation • The result agrees with previously known result that yeast responds to favorable growth conditions by increasing the production of ribosomes.

4/30/2020 © Bud Mishra, 2002 Lec.5-30

Comments on Gene Clustering

• Descriptive approach to analyzing data • This approach can be potentially used to – Gain insight into a gene’s function.

– Identify classes of genes.

4/30/2020 © Bud Mishra, 2002 Lec.5-31

Large Margin Classifier

4/30/2020 © Bud Mishra, 2002 Lec.5-32

Support Vector Machine

• Classification Microarray Expression Data • Brown, Grundy, Lin, Cristianini, Sugnet, Ares & Haussler ’99 • Analysis of (Stanford) S. cerevisiae data from Pat Brown’s Lab – Instead of clustering genes to see what groupings emerge – Devise models to match genes to predefined classes 4/30/2020 © Bud Mishra, 2002 Lec.5-33

The Classes

• From the MIPS yeast genome database (MYGD) – Tricarboxylic acid pathway (Krebs cycle) – Respiration chain complexes – Cytoplasmic ribosomal proteins – Proteasome – Histones – Helix-turn-helix (control) • Classes come from biochemical/genetic studies of genes 4/30/2020 © Bud Mishra, 2002 Lec.5-34

Gene Classification

• Learning Task – Given : Expression profiles of genes and their class tables – Do : Learn models distinguishing genes of each class from genes in other classes • Classification Task – Given : Expression profile of a gene whose class is not unknown – Do : Predict the class to which this gene belongs 4/30/2020 © Bud Mishra, 2002 Lec.5-35

The Approach

• Brown et al. applies a variety of algorithms to this task: – Support vector machines (SVMs) [Vapnik ’95] – Decision Trees – Parzen Windows – Fisher linear discriminant 4/30/2020 © Bud Mishra, 2002 Lec.5-36

Support Vector Machines

• Consider the genes in our example as m points in an n dimensional space ( m genes, n experiments) 4/30/2020 Experiment 1 © Bud Mishra, 2002 Lec.5-37

Support Vector Machines

4/30/2020 Experiment 1 © Bud Mishra, 2002 • Leaning in SVMs involves finding a hyperplane ( decision surface ) that separates the examples of one class from another.

Lec.5-38

Support Vector Machines

• For the i th example, let x i be the vector of expression measurements, and y i be +1, if the example is in the class of interest; and –1, otherwise • The hyperplane is given by: w ¢ x + b = 0 where b = constant and w= vector of weights 4/30/2020 © Bud Mishra, 2002 Lec.5-39

Support Vector Machines

• The function used to classify examples is then where y P y P = sgn(w ¢ = predicted value of y.

Support Vector Machines

• There may be many such hyperplanes..

• Which one should we choose?

Maximizing the Margin

4/30/2020 Experiment 1 • Key SVM idea – Pick the hyperplane that maximizes the margin—the distance to the hyperplane from the closest point – Motivation: Obtain tightest possible bounds on the error rate of the classifier.

SVM: Finding the Hyperplane

• Can be formulated as an optimization task – Minimize  i=1 n w i 2 – Subject to 8 i: y i [w ¢ x + b] ¸ 1 4/30/2020 © Bud Mishra, 2002 Lec.5-43

Learning Algorithm for Separable Problems

• Vapnik & Lerner ’63; Vapnik & Chervonenkis ’64 • Class of hyperplanes: – w ¢ x + b =0, w 2 R n , b 2 R • Decision Function: – f(x) = sgn ( w – max w,b ¢ between classes x + b) • Construct f from empirical data (“Generalized Portrait”) – Among all hyperplanes separating the data, there exists a unique one yielding maximum margin of sepration min { |x –x | i 2 : x 2 R n , w ¢ x +b = 0, i=1,..n} 4/30/2020 © Bud Mishra, 2002 Lec.5-44

Maximum Margin

x 2 {x | w ¢ 1} x + b = w x 1 {x | w ¢ +1} x + b = {x | w ¢ x + b = 0} w ¢ x 1 + b = +1 w ¢ x 2 + b = -1 w ¢ (x 1 – x 2 ) = 2 ) (x 1 - x 2 ) 1 w = 2 /|w| 2 4/30/2020 © Bud Mishra, 2002 Lec.5-45

Construction of Optimal Hyperplane

• Margin = (x 1 – x 2 ) ¢ 1 w = 2/|w| 2 • Optimization Problem: Maximize margin (=2/|w| 2 ) with a hyperplane separating the classes: – Minimize  i=1 m – Subject to w i 2 8 i 2 {1,..n} y i (w ¢ x i + b) ¸ 1.

Optimization Problem

• Using Lagrange Multipliers – a i ¸ 0, i – L(w,b, a 2 {1,.., m} ) = (1/2) w variables w and b T w  i=1 m a i [ y i (x i ¢ w + b) –1] – Minimize the Lagrangian L with respect to the primal – Maximize the Lagrangian L with respect to the dual variables a i .

Intuition

• If a saddle point is violated, then – y i (w ¢ x i + b) – 1 < 0 – L can be increased by increasing the corresponding a i – w and b have to change such that L decreases • To prevent “ a i [y i (w ¢ x i + b) – 1]” from becoming arbitrarily large, the change in w and b will ensure that eventually the constraint is satisfied.

– Assuming that the problem is separable • Karush-Kuhn-Tucker Complementarity Condition – For all constraints which are not satisfied precisely as equalities, I.e. y i (w ¢ x i + b) – 1 > 0, the corresponding a I ´ 0.

Duality

• At saddle point, the derivatives with respect to the primal variables must vanish: – (  /  – (  /  b) L(w,b, a ) = 0.

)

w) L(w,b, a ) = 0.

)

 i=1 m w a i y i  i=1 m = 0 a i y i x i = 0 – w =  i=1 m a i y i x i • By the Karush-Kuhn-Tucker complementarity: – a i [ y i (x i ¢ w + b) – 1] = 0, 8 • Those patterns whose a i  0 i 2 ) {1,.., m} “Support Vectors” 4/30/2020 © Bud Mishra, 2002 Lec.5-49

Lagrangian

• L(w,b, a ) = w T w/2  i=1 m = (1/2)  i,j = 1 m a i a i [ y i a j • Maximize the Dual: – W( a ) =  i=1 m a i (x i y I y j ¢ – (1/2)  i,j=1 m w + b) –1] (x i a i ¢ a j x y i j ) y j (x i ¢ x j ) 4/30/2020 © Bud Mishra, 2002 Lec.5-50

Wolfe Dual Optimization Problem

• Maximize W( a ) =  i=1 m • Subject to a i – (1/2)  i,j=1 m a i a j y a i ¸ 0, i = 1, …, m and  i=1 m a i y i = 0 i y j (x i ¢ x j ) 4/30/2020 © Bud Mishra, 2002 Lec.5-51

Decision Function

• The hyperplane decision function can be written as f(x) = sgn(  i=1 m y i a i (x • Where b is the solution to ¢ a i [ y i (x i ¢ w + b) – 1] = 0 x i ) + b] 4/30/2020 © Bud Mishra, 2002 Lec.5-52

Dealing with Data Not Separable by a Hyperplane

• Map the data into some other dot product space (called the “Feature Space”) F via a nonlinear map: – F : R N !

F • Kernel Function: k(x, y) = F (x) ¢ F (y) • Examples: – Sigmoid Kernel = k(x,y) = tanh ( k (x ¢ • k = gain and Q = Threshold y) + Q ) – Radial Basis Kernel = k(x,y) = exp{-|x-y| 2 /2 s 2 } 4/30/2020 © Bud Mishra, 2002 Lec.5-53

Dealing with Data Not Separable by a Hyperplane

• Find the maximum margin hyperplane in the feature space: • y P = sgn[w ¢ F (x) + b] =sgn[  Ii=1 m =sgn[  i=1 m • Optimization Problem: – Maximize: W( a ) =  i=1 m x j ) – Subject to: a i  i=1 m ¸ a i y y i i y i a a I i ( = 0 F (x) k(x, x a i i – (1/2)  i,j =1 m 0, i =1, …, m ¢ F (x ) + b] i ) + b] a i a j y i y j k(x i , 4/30/2020 © Bud Mishra, 2002 Lec.5-54

Dealing with the Noise in Data

• One can relax the requirement that the hyperplane strictly separates the data.

• A soft margin allows some misclassified training examples: – Introduce m slack variables x i ¸ 0, 8 i 2 {1,…,m} – Minimize the objective function: t (w, x ) = (1/2) |w| 2 + C  i =1 m x i (C > 0) • Dual Optimization Problem: – Maximize: W( a ) =  i=1 m – Subject to: 0 5 a i 5 a i – (1/2) C , i =1, …, m  i,j =1 m  i=1 m a i y i = 0 a i a j y i y j k(x i , x j ) 4/30/2020 © Bud Mishra, 2002 Lec.5-55

SVM & Neural Networks

• SVM – Represents linear or nonlinear separating surface – Weights determined by optimization method (optimizing margins) • Neural Network – Represents linear or nonlinear separating surface – Weights determined by optimization method (optimizing sum of squared error—or a related objective function) 4/30/2020 © Bud Mishra, 2002 Lec.5-56

Experiments

• 3-fold cross validation • Create a separate model for each class • SVM with various kernel functions – Dot product raised to power d= 1,2,3: k(x,y) = (x ¢ y) d – Gaussian • Various Other Classification Methods – Decision trees – Parzen windows – Fisher linear discriminant 4/30/2020 © Bud Mishra, 2002 Lec.5-57

Class Krebs cycle Respiration Ribosome Proteasome Histone Helix-turn-helix 4/30/2020

SVM Results

• SVM had highest accuracy for all classes (except the control) • Many of the false positives could be easily explained in terms of the underlying biology: – E.g. YAL003W was repeatedly assigned to the ribosome class • Not a ribosomal protein • But known to be required for proper functioning of the ribosome.

Udine Lectures Lecture #5: Microarray Analysis of Networks ¦ Bud Mishra Professor of Computer Science and Mathematics (Courant, NYU) Professor (Watson School, CSHL) 7 ¦ 9 ¦

Transcript Udine Lectures Lecture #5: Microarray Analysis of Networks ¦ Bud Mishra Professor of Computer Science and Mathematics (Courant, NYU) Professor (Watson School, CSHL) 7 ¦ 9 ¦

Microarray Analysis of Networks

Gene Expression

Transcriptional State of a Cell

Gene Expression Data

Gene Expression Data

Gene Expression Data

The Computational Tasks

Microarrays

Microarrays

Complementary Hybridization

Complementary Hybridization

Spotted Array

Spotted Arrays

Spotted Arrays

Oligonucleotde Arrays

Oligonucleotide Arrays

Oligonucleotide Arrays

Cluster Analysis

Genome-wide Cluster Analysis

Genome-wide Cluster Analysis

The Data

The Data

The Task

The Task

Approaches

Hierarchical Clustering

Hierarchical Clustering

Gene Similarity Metric

Gene Similarity Metric

Results

Comments on Gene Clustering

Large Margin Classifier

Support Vector Machine

The Classes

Gene Classification

The Approach

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Maximizing the Margin

SVM: Finding the Hyperplane

Learning Algorithm for Separable Problems

Maximum Margin

Construction of Optimal Hyperplane

Optimization Problem

Intuition

Duality

Lagrangian

Wolfe Dual Optimization Problem

Decision Function

Dealing with Data Not Separable by a Hyperplane

Dealing with Data Not Separable by a Hyperplane

Dealing with the Noise in Data

SVM & Neural Networks

Experiments

SVM Results

SVM Results

Directory