Transcript Final

Group Populus:
Petra van Berkel
Casper Gerritsen
Astri Herlino
Brian Lavrijssen
Dataset of S. cerevisiae
 Data generated by Nookaew et al (2012)
 Two conditions:
 Glucose excess (Batch) & Glucose limited (Chemostat)

3 Biological replicates per condition
 RNA-seq data:
 12 Files

3 Sets of Paired-end reads per condition
 Pipeline for differential gene expression analysis
TopHat – Cufflinks analysis
 Protocols based on Trapnell et al (2012)
 75% of reads mapped
 Plots based on Cuffdiff gene expression output
Cuffdiff output
• 5800 genes with FPKM values
• Q-value threshold based on Nookaew et al (2012)
Data Summary
Significant differentially expressed
FPKM > 0 and value_2 > 0
log2(fold change) > 1
log2(fold change) < -1
log2(fold change) > 3
log2(fold change) > -2
Q-value
< 0.05
2560
2554
735
510
177
44
Q-value
< 1e-5
1293
1292
516
410
151
33
Validation of TopHat - Cufflinks
 Validation of selection
 Using Excel
 Literature study
 Boer et al (2003)




Influence of C, N, P and S limitation
Microarray analysis
> 68 out of 151 significantly upregulated
> 9 out of 33 significantly downregulated
 More or less same genes found in other papers
Expression network up




Up regulated genes
mrnet method in R
Number of Nodes = 57
Number of Edges = 1560
Expression network down




Down regulated genes
mrnet method in R
Number of Nodes = 33
Number of Edges = 513
GO Terms and GO Enrichment
R version 2.15.0 (2012-03-30)
 Packages:
 biomaRt: Ensembl gene 69, S. cerevisiae EF3
 org.Sc.sgd.db
 GOstats
 Rgraphviz
 GO enrichment:
 8419 genes in the universe (org.Sc.sgdPMID2ORF)
 Threshold: p-value < 10-4
GO Terms
 Down regulated
32 genes  29 genes with 208 GO terms (3 genes are not annotated)
Gene
GO ID
Description
Low affinity glucose transporter
HXT3
GO:0006810, GO:0016020,
GO:0016021, GO:0005215, GO:0055085
High-affinity glucose transporter
HXT4
GO:0006810, GO:0055085,
GO:0022891, GO:0005215, GO:0022857
 Up regulated
133 genes  113 genes with 855 GO terms (20 genes are not annotated)
Gene
GO ID
Description
-
Protein of unknown function involved
in energy metabolism under
respiratory conditions
-
Protein required for survival at high
temperature during stationary phase
GO:0097079, GO:0015355,
GO:0022857, GO:0016021, GO:0034219
Monocarboxylate/proton symporter
of the plasma membrane
RGI2
SPG4
JEN1
GO Enrichment
 Down regulated
 Biological process: not found
 Up regulated
GOBPID
Pvalue
OddsRatio ExpCount Count
Size
Term
GO:0055114
2.02E-10
4.98
7.66
29
415oxidation-reduction process
monocarboxylic acid catabolic
23 process
generation of precursor
221 metabolites and energy
GO:0072329
2.41E-10
33.95
0.46
9
GO:0006091
1.70E-09
6.00
4.40
21
GO:0006099
3.75E-09
22.61
0.60
9
30tricarboxylic acid cycle
GO:0009109
3.75E-09
22.61
0.60
9
30coenzyme catabolic process
Biological process of up regulated
genes
 Validation: Yeast genome database
 Problem:
 Not well annotated because the biomaRt was not
updated to Ensembl gene 70, S. cerevisiae EF4
Top 100
 gffread: make the transcripts fasta file
 Determine the top 100 highest and lowest expressed
genes for the two conditions
 R: order cuffdiff output on FPKM value (4 files)
 Take out the genes with FPKM = 0
Top 100
 Top genes:
 G3P dehydrogenase,
 F16P aldolase,
 Ribosomal subunit protein
 Bottom genes:
 dubious transcript,
 retro transposon,
 etc..
GC-content & transcript length
 Determine GC-content and transcript length
 Import top 100 genes files
 For each file check the genes in top 100 file in
transcripts.fa and count GC content and the transcript
length
GC-content & transcript length
 Highly expressed in batch:
 Length: 515.19 GC: 0.43
 Lowly expressed in batch:
 Length: 831.46 GC: 0.41
 Highly expressed in chemostat:
 Length: 556.65 GC: 0.43
 Lowly expressed in chemostat:
 Length: 727.29 GC: 0.41
GC-content & transcript length
 Short sequence length!
 mainly in highly expressed genes, gives unrealistic view
of codon usage and intron length
 These are often ribosomal subunit proteins
Intron length





Genes.gtf as input
Create an indexfile
Look for the
interesting genes
Print them to an
outputfile
Calculate average
file
mean intron length
introns_hi1.out
429.455
introns_hi2.out
440.125
introns_low1.out
60.6667
introns_low2.out
43.5
Codon usage
 Method (perl script):
 Input are top high and low expressed genes
 Build gene ID list and codons list and retrieve sequences
 Count codon usage and calculate RSCU and average RSCU
Conclusion
 The up and down regulated genes are involved in
carbon metabolism
 Highly expressed genes are involved in carbon
metabolism or are ribosomal subunit proteins