Transcript Final
Group Populus:
Petra van Berkel
Casper Gerritsen
Astri Herlino
Brian Lavrijssen
Dataset of S. cerevisiae
Data generated by Nookaew et al (2012)
Two conditions:
Glucose excess (Batch) & Glucose limited (Chemostat)
3 Biological replicates per condition
RNA-seq data:
12 Files
3 Sets of Paired-end reads per condition
Pipeline for differential gene expression analysis
TopHat – Cufflinks analysis
Protocols based on Trapnell et al (2012)
75% of reads mapped
Plots based on Cuffdiff gene expression output
Cuffdiff output
• 5800 genes with FPKM values
• Q-value threshold based on Nookaew et al (2012)
Data Summary
Significant differentially expressed
FPKM > 0 and value_2 > 0
log2(fold change) > 1
log2(fold change) < -1
log2(fold change) > 3
log2(fold change) > -2
Q-value
< 0.05
2560
2554
735
510
177
44
Q-value
< 1e-5
1293
1292
516
410
151
33
Validation of TopHat - Cufflinks
Validation of selection
Using Excel
Literature study
Boer et al (2003)
Influence of C, N, P and S limitation
Microarray analysis
> 68 out of 151 significantly upregulated
> 9 out of 33 significantly downregulated
More or less same genes found in other papers
Expression network up
Up regulated genes
mrnet method in R
Number of Nodes = 57
Number of Edges = 1560
Expression network down
Down regulated genes
mrnet method in R
Number of Nodes = 33
Number of Edges = 513
GO Terms and GO Enrichment
R version 2.15.0 (2012-03-30)
Packages:
biomaRt: Ensembl gene 69, S. cerevisiae EF3
org.Sc.sgd.db
GOstats
Rgraphviz
GO enrichment:
8419 genes in the universe (org.Sc.sgdPMID2ORF)
Threshold: p-value < 10-4
GO Terms
Down regulated
32 genes 29 genes with 208 GO terms (3 genes are not annotated)
Gene
GO ID
Description
Low affinity glucose transporter
HXT3
GO:0006810, GO:0016020,
GO:0016021, GO:0005215, GO:0055085
High-affinity glucose transporter
HXT4
GO:0006810, GO:0055085,
GO:0022891, GO:0005215, GO:0022857
Up regulated
133 genes 113 genes with 855 GO terms (20 genes are not annotated)
Gene
GO ID
Description
-
Protein of unknown function involved
in energy metabolism under
respiratory conditions
-
Protein required for survival at high
temperature during stationary phase
GO:0097079, GO:0015355,
GO:0022857, GO:0016021, GO:0034219
Monocarboxylate/proton symporter
of the plasma membrane
RGI2
SPG4
JEN1
GO Enrichment
Down regulated
Biological process: not found
Up regulated
GOBPID
Pvalue
OddsRatio ExpCount Count
Size
Term
GO:0055114
2.02E-10
4.98
7.66
29
415oxidation-reduction process
monocarboxylic acid catabolic
23 process
generation of precursor
221 metabolites and energy
GO:0072329
2.41E-10
33.95
0.46
9
GO:0006091
1.70E-09
6.00
4.40
21
GO:0006099
3.75E-09
22.61
0.60
9
30tricarboxylic acid cycle
GO:0009109
3.75E-09
22.61
0.60
9
30coenzyme catabolic process
Biological process of up regulated
genes
Validation: Yeast genome database
Problem:
Not well annotated because the biomaRt was not
updated to Ensembl gene 70, S. cerevisiae EF4
Top 100
gffread: make the transcripts fasta file
Determine the top 100 highest and lowest expressed
genes for the two conditions
R: order cuffdiff output on FPKM value (4 files)
Take out the genes with FPKM = 0
Top 100
Top genes:
G3P dehydrogenase,
F16P aldolase,
Ribosomal subunit protein
Bottom genes:
dubious transcript,
retro transposon,
etc..
GC-content & transcript length
Determine GC-content and transcript length
Import top 100 genes files
For each file check the genes in top 100 file in
transcripts.fa and count GC content and the transcript
length
GC-content & transcript length
Highly expressed in batch:
Length: 515.19 GC: 0.43
Lowly expressed in batch:
Length: 831.46 GC: 0.41
Highly expressed in chemostat:
Length: 556.65 GC: 0.43
Lowly expressed in chemostat:
Length: 727.29 GC: 0.41
GC-content & transcript length
Short sequence length!
mainly in highly expressed genes, gives unrealistic view
of codon usage and intron length
These are often ribosomal subunit proteins
Intron length
Genes.gtf as input
Create an indexfile
Look for the
interesting genes
Print them to an
outputfile
Calculate average
file
mean intron length
introns_hi1.out
429.455
introns_hi2.out
440.125
introns_low1.out
60.6667
introns_low2.out
43.5
Codon usage
Method (perl script):
Input are top high and low expressed genes
Build gene ID list and codons list and retrieve sequences
Count codon usage and calculate RSCU and average RSCU
Conclusion
The up and down regulated genes are involved in
carbon metabolism
Highly expressed genes are involved in carbon
metabolism or are ribosomal subunit proteins