Computational Biology

Download Report

Transcript Computational Biology

V5 Forward genetics – molecular markers Review of lecture V4 ...

- Arabidopsis genome contains larger number of gene duplications - More than 50% of genome duplicated - TAIR website contains full sequence of

columbia

ecotype SS 2008 lecture 5 Biological Sequence Analysis 1

Reverse genetics

Reverse genetics approach tries to identify the

function

of a particular

gene

through the study of the impacts of a manipulation in its sequence. Possible

manipulations

: - random insertions or deletions - site-directed mutagenesis (point mutations) - gene knockout (yeast or mouse) - RNA silencing After an alteration, the method attempts to find a possible phenotype that may have derived from this sequence change. If variations become observable, conclusions can be drawn about the normal underlying function of the mutated gene.

Modifying sequence of a gene requires sequence information retrieved from Biological Sequence Analysis SS 2008 lecture 5 2

Reverse genetics

After choosing a specific target sequence, select mutations that inactivate the gene or disrupt its function and thus hopefully lead to a mutated visible phenotype. Main

advantage

of reverse genetic studies: concerned gene is already known beforehand. Regrettably, the used mutations often result in reduced function (thus gain-of-function mutations can not be identified) and the discovery of redundant pathways is not possible.

Unfortunately, also only a small portion of the mutations exhibit informative phenotypes and even fewer display morphological changes providing a direct clue about gene function.

M.Sc. thesis S. Pfeifer Biological Sequence Analysis SS 2008 lecture 5 3

Forward genetics

Instead one often uses forward genetic (also called classical genetic) approach to discover the function(s) of a gene Its allows - to consider gain-of function mutations, - identifying genes acting within a common pathway as well as genes encoding for interacting proteins and - it is not restricted to any tissue type. Because of its wide area of applications, this method is often the preferred strategy in functional studies.

SS 2008 lecture 5 Biological Sequence Analysis M.Sc. thesis S. Pfeifer 4

Meiosis

Meiosis can be divided into the first and the second meiosis. First meiosis: segregation of the homologous chromosomes from each other and division of the diploid cell into two haploid cells each containing one of the segregates. Second meiosis: decouples each chromosome’s sister strands, the chromatids and the segregation of the DNA into two sets of strands (each containing one of each homologue). It further divides both haploid, duplicated cells to produce 4 gametes which can fuse with other haploid cells during fertilisation to create a new diploid cell, or zygote.

SS 2008 lecture 5 Biological Sequence Analysis M.Sc. thesis S. Pfeifer 5

Meiosis terms

A

zygote

is a cell that is the result of fertilization. The

haploid number

is the number of chromosomes in a gamete of an individual.

Diploid

cells have two homologous copies of each chromosome, usually one from the mother and one from the father. Plants and some algae switch between a haploid and a diploid or polyploid state, with one of the stages emphasized over the other. SS 2008 lecture 5 Biological Sequence Analysis 6

Meiosis terms Zygosity

describes the similarity or dissimilarity of DNA between homologous chromosomes at a specific allelic position or gene. Every gene in a diploid organism has two alleles at the gene's locus. These alleles are defined as dominant or recessive, depending on the phenotype resulting from the two alleles. An organism is called

homozygous

at a specific locus when it carries two identical copies of the gene affecting a given trait on the two corresponding homologous chromosomes (e.g., the genotype is

PP

or

pp

when P and p refer to different possible alleles of the same gene). An organism is

heterozygous

at a locus or gene when it has different alleles occupying the gene's position in each of the homologous chromosomes. In diploid organisms, the two different alleles were inherited from the organism's two parents. For example a heterozygous individual would have the allele combination Pp. Biological Sequence Analysis SS 2008 lecture 5 7

During the pairing of the homologue chromosomes in the first meiosis, the synapsis, two copies of each chromosome pair become physically close.

Meiosis

A process named recombination or crossover can happen, if the homologue chromosome arms undergo a breakage and an exchange of DNA segments, resulting in gametic chromosomes consisting of material from both members of the chromosome pair.

Biological Sequence Analysis SS 2008 lecture 5 M.Sc. thesis S. Pfeifer 8

Meiosis

The

crossover

directly affects the inheritance pattern of the involved genes as it determines whether two genes will remain linked and inherited together or whether they will be separated and inherited independently.  meiosis not only ensures proper chromosome disjunction but also contributes to genetic diversity among the gametes. Because recombination events are able to give an insight on the distance of two genes, they are capable to assist map-based cloning approaches. Map-based cloning relies on this high frequency genetic exchange events of meiotic recombination because two closely adjacent markers are separated less frequently than two markers which are more distant to each other during a random occurring recombination. In general, the crossover probability between two markers increases monotonically as the distance between the two markers increases along the chromosome.

SS 2008 lecture 5 Biological Sequence Analysis M.Sc. thesis S. Pfeifer 9

Map-based cloning Outcrossing

: practice of introducing unrelated genetic material into a breeding line. Biological Sequence Analysis SS 2008 lecture 5 M.Sc. thesis S. Pfeifer 10

Marker in F

2

generation

SS 2008 lecture 5 Biological Sequence Analysis M.Sc. thesis S. Pfeifer 11

Bulk analysis of mutation effect

Below: schematic representation of the marker positions used in the mapping experiment. Open circles: centromeres.

Right: gel of PCR products for these markers. In each panel, the left lane shows the result for the heterozygous control sample, and the right lane that for a pooled mutant sample is given on the right side. Bands specific for Ler ecotype are marked with an asterisk. The mutation created in Ler is linked to markers ciw 1 and nga 280, because both markers show only the Ler specific band. In contrast, all other used markers show approximately the same ratio of Col and Ler amplification in both lanes. This indicates that the mutation is not linked with these loci. Lukowitz et al. [2000].

SS 2008 lecture 5 Biological Sequence Analysis M.Sc. thesis S. Pfeifer 12

SNPs

In most organisms, SNPs comprise the largest set of sequence variants. SNP: a single nucleotide replaces one of the other three nucleotides between members (see Figure below).

transitions

(substitutions between purines A and G or between pyrimidines C and T and

transversions

(substitutions between a purine and a pyrimidine). In

Arabidopsis

both kinds are equally abundant in the genome (see Table). SS 2008 lecture 5 Biological Sequence Analysis M.Sc. thesis S. Pfeifer 13

SNPs

SNPs can be found - in intergenic regions (frequency 1 SNP per 3.5 kb), - in coding (frequency 1 SNP per 2.2 kb) as well as - in non-coding areas (frequency 1 SNP per 3.1 kb) of genes. SNPs falling within coding zones are of particular interest. Due to redundancy in the genetic code not every modification mandatory results in a different amino acid.

SS 2008 lecture 5 Biological Sequence Analysis M.Sc. thesis S. Pfeifer 14

Molecular markers: SNPs

SS 2008 lecture 5 Biological Sequence Analysis M.Sc. thesis S. Pfeifer 15

RFLPs – Restriction fragment length polymorphisms

RFLPs [Botstein et al., 1980], were one of the first developed types of DNA markers. They exploit the circumstance that variant accessions have almost identical genomes but they always differ at a few nucleotides (due to base substitutions, insertions, deletions or sequence rearrangements during the evolution).

Idea

: Use these variations to distinguish between ecotypes.

Employ restriction endonucleases that recognise specific nucleic acid sequences in the DNA and cleave given sequences at these (or adjacent) sites. Some restriction enzymes and their recognition sites (arrows indicates the cut site). Some enzymes recognise not only one particular sequence but also allow variations of certain nucleotides within their recognition site. E.g. N stands for any nucleotide. Source: Restriction Enzyme Database (REBASE) [2007] M.Sc. thesis S. Pfeifer Biological Sequence Analysis SS 2008 lecture 5 16

Effect of RFLPs

SS 2008 lecture 5 Biological Sequence Analysis M.Sc. thesis S. Pfeifer 17

RFLPs – Restriction fragment length polymorphisms

After a cut, the obtained fragments may show differences in their sizes (due to insertions or deletions) and also the number of produced pieces may vary (through an alteration of a recognition site in the sequence by base change) between dissimilar accessions (see Fig.).

SS 2008 lecture 5 Biological Sequence Analysis M.Sc. thesis S. Pfeifer 18

Cleaved amplified polymorphic sequence (CAPS) markers

CAPS markers detect single base changes that create or remove a recognition site for a restriction enzyme in one of a pair of alleles.

SS 2008 lecture 5 Biological Sequence Analysis M.Sc. thesis S. Pfeifer 19

Molecular markers: short sequence repeats

Simple sequence repeats, SSR, (also called short tandem repeats, STR, simple sequence length polymorphisms, SSLP, or microsatellites) are highly polymorphic loci present in DNA consisting of short 2-4 bp long sequence motifs repeating multiple times embedded in DNA with unique sequences.

Minisatellites (also named variable number tandem repeats, VNTR) are similar to SSRs, but their repeated sequence is longer (about 10-100 base pairs). Both often arise from tandem duplications or slipped strand mispairing (slippage) occurring during replication or DNA repair on a single DNA double helix.

M.Sc. thesis S. Pfeifer Biological Sequence Analysis SS 2008 lecture 5 20

Microsatellites

Classify microsatellites according to the number of nucleotides in the repeat unit.

Mononucleotide and dinucleotide repeat elements are quite common, longer fragments become increasingly unlikely. Alternative classification:

perfect repeats

, containing a single uninterrupted repeat element flanked on both sides by non-repeated sequences, and

imperfect ones

with two or more runs of the same repeat unit interrupted by short stretches of other sequences. Besides these simple perfect repeats (such as (CA) n ) and simple imperfect repeats (for example (CA) n GT(CA) m ), composed perfect repeats (for instance (AC) n (TC) m ) and composed imperfect repeats (such as (CA) n A(AC) m A(GA) o ) also arise in the M.Sc. thesis S. Pfeifer genome of most organisms.

Biological Sequence Analysis SS 2008 lecture 5 21

Molecular markers: microsatellites

Already in 1984, Tautz and Renz showed that all possible types of perfect simple sequence repeats composed of only one or two nucleotide(s) are present to at least some extent in eukaryotic genomes and that one can expect to encounter at least one simple sequence stretch every 10 kb of DNA sequence. In 1994, Bell and Ecker addressed mono- or dinucleotide repeats which are greater than 20 nucleotides long in the Arabidopsis accessions Columbia and Landsberg erecta.  most of them display polymorphisms between these ecotypes due to variation in the number of the repeat units. In 2000, this result was affirmed by a study of Lukowitz et al. showing that there is a likelihood of 40 % that such DNA segments are polymorphic between different accessions.

SS 2008 lecture 5 Biological Sequence Analysis M.Sc. thesis S. Pfeifer 22