Data generation - Bioinformatics

Download Report

Transcript Data generation - Bioinformatics

Comparative
transcriptomic
analysis
of fungi
Group Nicotiana
Daan van Vliet, Dou Hu, Joost de Jong, Krista Kokki
Research objective
To study differences in gene expression in
related fungus species
Studies species:
- Reference genome
- RNA reads > 100 bp
- Preferably: Paired-end
- Related species
- Similar conditions
Comparison
Comparison between different species
- Saccharomyces cerevisiae (yeast)
- Komogataella pastoris
(Pichia, yeast)
- Aspergillus oryzae
(fungus)
Methods – Data [Daan]
RNA-seq:
SRA
Genome and annotation: Ensembl Fungi
Read quality analysis performed with FastQC
Methods - Data processing
Cleaning reads:
Mapping reads:
Assembly/Quantification:
Optional replicate assembly:
Extracting transcript seqs:
Selection of top 100 genes:
SolexaQA
TopHat
Cufflinks
Cuffmerge
gffread
Linux
Methods – Gene properties
Property
Explanation
Tool (input datafile)
Expression
Count of mapped reads
Perl script (fasta)
Length
Count of base pairs of whole gene
Perl script (fasta)
Intron length
Count of base pairs within introns
Perl script (gtf)
GC content
GC count/Length
Perl script (fasta)
Nc
Ratio: 20-61; 20 = one codon per
amino acid; 61: random codon use
CodonW (fasta)
CG3s
GC content of 3RD synonymous
codon position
CodonW (fasta)
Methods – Interaction
Top 100 genes were mapped to the
interactome file and visualised through
Cytoscape.
Hypothesis for yeast - Validation
• GC-content correlates positively with gene
length.
• Negative correlation with gene length and
degree of codon bias.
• Codon bias is more extreme in highly
expressed genes.
• Genes with longer introns show higher bias in
codon usage.
• The overall codon usage matches the known
bias.
GO-terms and gene locations
GOBPID Pvalue
OddsRatio ExpCount Count
Size
Term
1 GO:0002181
3.58E-97
54.125 6.305508
95
171 cytoplasmic translation
2 GO:0044238
1.80E-14 3.670035 51.04701
96
3 GO:0071843
2.06E-12 3.421344 21.49811
57
4 GO:0006407
7.92E-12 37.62835 0.700612
11
19 rRNA export from nucleus
5 GO:0070925
4.44E-11 8.76313 2.949945
19
80 organelle assembly
1319 primary metabolic process
cellular component biogenesis at cellular
587 level
The top 5 most over-represented GO-terms for all the found genes
Chromosome I II III IV V VI VII VIIIIX X XI XII XII XIV XV XVI
Nro. of genes
3 15 5 46 5 16 14 5 0 11 12 37 17 11 1 2
The chromosomes the genes are found in.
Results – Correlations
Gene expression vs. Gene length
Saccharomyces cerevisiae
Aspergillus oryzae
Komogataella pastoris
Results – Correlations
Gene expression vs. Intron length
Saccharomyces cerevisiae
Aspergillus oryzae
Komogataella pastoris
Results – Correlations
Gene expression vs. Effective Nr of codons
Saccharomyces cerevisiae
Aspergillus oryzae
Komogataella pastoris
Results – Correlations
Effective Nr of Codons vs. GC-cont. 3rd pos.
Saccharomyces cerevisiae
Aspergillus oryzae
Komogataella pastoris
Results – Correlations
Gene length vs. Effective Nr of Codons
Saccharomyces cerevisiae
Aspergillus oryzae
Komogataella pastoris
Results – Correlations
Gene length vs. GC-content
Saccharomyces cerevisiae
Aspergillus oryzae
Komogataella pastoris
Results – Correlations
Gene length vs. Intron length
Saccharomyces cerevisiae
Aspergillus oryzae
Komogataella pastoris
Results – Correlations
Intron length vs. Nc
Saccharomyces cerevisiae
Aspergillus oryzae
Komogataella pastoris
Results – Correlations
Overall:
- Within species:
Few correlations between gene properties
- Between species:
Different patterns(?)
Cytoscape
• GO terms
Top100 genes show different interactive network in GO terms
Results - First choice
Yeast Interactome Project for S. cerevisiae
•high-throughput yeast two-hybrid (Y2H)
provides high-quality binary interaction
information.
•high-throughput Y2H dataset covering ~20% of
all yeast binary interactions.
•This binary map is enriched for transient
signalling interactions and inter-complex
connections with a highly significant clustering
between essential proteins.
Database choosing
• interactions from CCSB-YI1
1,809 interactions among 1,278 proteins
Second choice
YeastNet v. 2
•a probabilistic functional gene network of yeast genes,
constructed from ~1.8 million expermental
observations from DNA microarrays, physical protein
interactions, genetic interactions, literature, and
comparative genomics methods.
• In total, YeastNet v.2 covers 102,803 linkages among
5,483 yeast proteins
•a modified Bayesian integration of diverse data types,
with each data type weighted according to how well it
links genes that are known to share functions. (LLS)
Database choosing
• All the top 100 genes could find interactors in
the Yeastnet v.2.
• We could find 9896 possibilities among
102,803 linkages
The end
Questions?
Results – Correlations
Gene expression vs. CG content
Saccharomyces cerevisiae
Aspergillus oryzae
Komogataella pastoris