Gene Expression Deconvolution with Single

Download Report

Transcript Gene Expression Deconvolution with Single

Gene Expression Deconvolution with Single-cell Data

J AMES L INDSAY 1 C AROLINE J AKUBA 2 I ON MANDOIU 1 C RAIG N ELSON 2 U NIVERSITY O F C ONNECTICUT 1 D EPARTMENT OF 2 D C OMPUTER EPARTMENT OF S CIENCE AND M OLECULAR AND E NGINEERING C ELL B IOLOGY

Mouse Embryo ANTERIOR / HEAD

Somites

POSTERIOR / TAIL

Node Primitive streak

Unknown Mesoderm Progenitor • What is the expression profile of the progenitor cell type?

NSB

=node-streak border;

PSM

=presomitic mesoderm;

S

=somite;

NT

=neural tube/neurectoderm;

EN

=endoderm

Characterizing Cell-types • Goal: Whole transcriptome expression profiles of individual cell-types • Technically challenging to measure whole transcriptome expression from single-cells • Approach: Computational Deconvolution of cell mixtures • Assisted by single-cell qPCR expression data for a small number of genes

Modeling Cell Mixtures Mixtures (X) are a linear combination of s ignature matrix (S) and concentration matrix (C) mixtures 𝑋 𝑚 𝑥 𝑛 = 𝑆 𝑚 𝑥 𝑘 ∙ 𝐶 𝑘 𝑥 𝑛 cell types mixtures

Previous Work 1.

• • • Coupled Deconvolution Given: X, Infer: S, C NMF Minimum polytope Repsilber, BMC Bioinformatics, 2010 Schwartz, BMC Bioinformatics, 2010 2.

• • • Estimation of Mixing Proportions Given: X, S Infer: C Quadratic Prog LDA Gong, PLoS One, 2012 Qiao, PLoS Comp Bio, 2o12 3.

• • Estimation of Expression Signatures Given: X, C Infer: S csSAM Shen-Orr, Nature Brief Com, 2010

Single-cell Assisted Deconvolution Given: X and single-cells qPCR data Infer: S, C Approach: 1.

2.

• • • Identify cell-types and estimate reduced signature 𝑆 Outlier removal K-means clustering followed by averaging Estimate mixing proportions C using Quadratic programming, 1 mixture at a time 𝑆 3.

• Estimate full expression signature matrix S using C Quadratic programming , 1 gene at a time

Step 1: Outlier Removal + Clustering Remove cells that have maximum Pearson correlation to other cells below .95

unfiltered filtered

Step 2: Estimate Mixture Proportions For a given mixture i: 2 ), 𝑠. 𝑡.

𝑐 = 1 𝑐 𝑙 ≥ 0 ∀𝑙 = 0 … 𝑘 𝑥 = 𝑋 𝑗,𝑖 ∀ 𝑗 = 1 … 𝑚 𝑐 = 𝐶 𝑙,𝑖 ∀ 𝑙 = 1 … 𝑘

Step 3: Estimating Full Expression Signatures mixtures cell types mixtures x: observed signals from new gene C: known from step 2 s: new gene to estimate signatures Now solve: min( 𝑠𝐶 − 𝑥 2 )

Experimental Design • •

Single Cell Profiles

92 profiles 31 genes • •

Actual Mixtures

12 mixtures 31 genes • • • •

Dimensions

k = 3 m = 31 n = 92, 12 # mixtures = {10…300} • •

Simulated Concentrations

Sample uniformly at random [0,1] Scale column sum to 1.

• •

Simulated Mixtures

Choose single-cells randomly with replacement from each cluster Sum to generate mixture

Data Processing RT-qPCR • CT values are the cycle in which gene was detected • Relative Normalization to house-keeping genes • HouseKeeping genes • gapdh, bactin1 • • geometric mean Vandesompele, 2002 • dCT(x) = geometric mean – CT(x) • expression(x) = 2^dCT(x)

Accuracy of Inferred Mixing Proportions

Concentration Matrix: Concordance predicted

Leave-one-out Accuracy of Inferred Gene Expression Signatures

Future Work • Apply gene signature estimation technique using more genes in mixed samples • Identify PSM-Pr Signature • Confirm the anatomical location of the putative PSM-Pr cell population through exhaustive ISH

Conclusion Special Thanks to: • • • Ion Mandoiu Craig Nelson Caroline Jakuba • Mathew Gajdosik [email protected]