Biology for Bioinformatics - NIU Department of Biological

Download Report

Transcript Biology for Bioinformatics - NIU Department of Biological

Biology for Bioinformatics
The Big Picture of Biology
What is Life?
What is Life?
•
•
NASA: life is a self-sustaining set of
chemical reactions capable of
reproducing similar copies of itself.
We can separate this into:
1. chemistry: movement of electrons between
atoms
2. reproduction, which immediately leads to
natural selection: offspring that are better at
surviving and reproducing end up taking
over in future generations.
Chemistry
• Life is applied chemistry. All living systems are based
on the interactions of chemical compounds, the sharing
of electrons between atoms.
• Life occurs in cells, with a membrane separating the
inside from the outside.
– membrane is impermeable to almost everything (but not
gasses or water).
– other molecules enter or leave using specific channels
• Homeostasis: maintaining a constant internal state
despite external changes. Homeostasis requires:
1. Metabolism: capture matter and energy from the outside world,
and use it to maintain, grow, reproduce.
2. Irritability: Detect and respond to environmental changes.
All life on Earth is similar
•
•
•
Current belief: life originated on Earth at least 3.5 billion
years ago, not long after the Earth’s surface cooled.
There may have been many forms of life early in our
history, many semi-independent origins, but we believe all
life on Earth today can be traced to a single common
ancestor. Sometimes referred to as the Last Universal
Common Ancestor (LUCA).
All organisms are made of same molecules in similar
structures: DNA used for instructions and heredity,
proteins do the necessary work, cells are surrounded by
lipid membranes
– Many seemingly arbitrary decisions, such as the
handedness of molecules and the genetic code, are identical
in all organisms that have been studied.
•
Individuals are born and then die, but each individual’s life
comes from previous life. “Life” is an unbroken chain of
living cells extending back 3 billion years ago to our
original common ancestor.
– DNA’s point of view: individuals exist merely as temporary
carriers of the DNA.
Reproduction
• All the information needed to produce a living
thing is coded in its DNA.
– this is a well-supported belief, but as is usual in
biology, there is some fuzziness around the edges
• To reproduce, organisms replicate their DNA,
then use the DNA instructions to create a new
organism.
– for microorganisms, this usually means growing large,
and then splitting into 2 halves, each of which gets a
copy of the DNA.
– fancier process in multicellular organisms
Evolution by Natural Selection
• Offspring resemble their parents because the offspring are built from
their parents’ genes.
• Random changes in the DNA (mutations) occur at a slow but steady
rate. This produces a lot of variation within a species.
• Some members of a species are more “fit”: better able to survive
and reproduce than other members of the species. This is natural
selection: the more fit individuals are “selected” by Nature to
reproduce more than the less fit individuals.
– this can also happen by artificial selection, where a human decides
which individuals will be allowed to reproduce.
• The genes from the more fit individuals will slowly take over the
species.
• Thus, the genes within a species slowly change (or occasionally,
change rapidly).
• However, most mutations have no effect on fitness, and all
organisms contain large numbers of DNA positions that are different
from other members of their species.
DNA Sharing
• An important consideration: DNA is traded between
different organisms, so innovations can spread very
widely.
– which is the reason antibiotic resistance is so widespread among
pathogenic bacteria.
– higher organisms use a formal sexual mechanism: each
offspring gets half its DNA from each parent. Also, DNA is
almost always confined within a species.
– bacteria share DNA less formally, with small segments being
passed around by several different mechanisms, often between
species.
• cross-species transfer is called lateral or horizontal transfer
• This process is quite widespread, and I have heard estimates that
up to 1/3 of all genes in bacteria have been transferred in from
another species, as opposed to have come from the common
ancestor by vertical transfer, regular parent-to-offspring descent.
Diversity of Life
• Lots of different ways to make a living
• Several million different species
• Species: a group of similar organisms that can reproduce with each
other but not with others.
– Easy to define in sexually-reproducing organisms, but not in bacteria
• Speciation: one species splits into two different species very easily:
– Isolate two groups so they can’t mate with each other
– Different random mutations quickly cause differences in sexual
attractiveness and fertility
– When brought back together, the two groups no longer want to mate
with each other, or they can’t produce fertile offspring.
– They are now two different species.
• Phylogeny: the branching pattern of descent of a species starting at
the universal last common ancestor.
– a binary tree: each parent node has 2 offspring nodes. Of course, any
given species may be extinct.
Basic Division of Life
•
Prokaryotes: simple cells with no
internal compartments: especially,
no separate nucleus that contains
the DNA.
•
Eukaryotes: more complex cells
with internal compartments and
membranes, with the DNA
contained in the nucleus, a special
membrane-bound compartment.
•
Viruses: have no metabolism,
aren’t composed of cells, are
parasites: use cells to reproduce
– called “bacteriophage” or just
“phage” in bacteria
– conventional theory: viruses are
escaped bits of cellular machinery
– but: viruses often have genes with
no homologues in living cells
Major Sub-divisions
• Prokaryotes: Eubacteria (common bacteria found
everywhere) and Archaea (special forms found in
extreme conditions such as hot, high salt, acidic).
• Eukaryotes: Protists (single celled), Fungi (digest their
food externally), Plants (produce food from sunlight),
Animals (move under their own power for part of their
lives).
– “Protist” is a catch-all group, containing many different lineages
that are no more related than animals and plants are. Also,
multicellular seaweeds are considered protists.
– In contrast, plants, animals and fungi each seem to have had a
single common ancestor.
Older
Trees
Above is Darwin’s original tree of life,
from his 1837 notebook.
To the right is A tree from Haeckel (1866)
Another View
I just like this one.
I don’t know its origin:
I found it on a creationist
web site.
Ring of Life
• This is rather
speculative, based on
the idea that
eukaryotes arose from
a fusion of a
bacterium and an
archaean.
– Bacteria contributed
the basic metabolic
pathways
– Archaea contributed
information handling
system
Biological Molecules
Molecules in the Cell
• The most common molecule in cells is water, which is
the universal solvent that all the other molecules are
dissolved in.
• Various small ions, dissolved salts, keep the cell in
osmotic balance.
– The main positively charged ions are sodium (Na+) potassium
(K+), magnesium (Mg2+) , and calcium (Ca2+).
– The main negative ions are chloride (Cl-) , bicarbonate (HCO3-) ,
and phosphate (PO4-).
• Four main classes of macromolecule: nucleic acids,
proteins, polysaccharides, and lipids. These molecules
are usually in the form of polymers, long chains of similar
subunits, which are called monomers.
• Miscellaneous “small” molecules that act as helpers (cofactors) in enzymatic reactions. Many of these are
“vitamins”: co-factors we humans can’t synthesize for
ourselves.
Carbohydrates
• Sugars and starches: “saccharides”.
• The name “carbohydrate” comes from the
approximate composition: a ratio of 1 carbon
to 2 hydrogens to one oxygen (CH2O). For
instance the sugar glucose is C6H12O6.
• Carbohydrates are composed of rings of 5 or
6 carbons, with alcohol (-OH) groups
attached. This makes most carbohydrates
water-soluble.
• Carbohydrates are used for energy
production and storage, and for structure.
• Glucose, a simple 6-carbon sugar, is the
primary fuel source for most living things. It is
broken down by the process of glycolysis.
• Starches are glucose polymers, used to store
fuel.
• Structural carbohydrates include cellulose
(another glucose polymer) and chitin, the
outer coating of insects and many fungi.
Lipids
• Lipids are the main non-polar component of
cells. Mostly hydrocarbons—carbon and
hydrogen.
• They are used primarily as energy storage
and cell membranes.
• Energy storage: triglycerides (fats).
Composed of glycerol attached to 3 fatty acid
molecules. Fatty acids are long chains of
carbon and hydrogen. Double bonds kink
the chains and lower the melting
temperature.
• Cell membranes are composed primarily of
phospholipids. These have 2 fatty acids
attached to glycerol, plus a phosphatecontaining polar “head group”.
– The heads stick into the water outside the
membrane, while the non-polar tails stay in
the hydrophobic interior of the membrane.
This acts as a waterproof coat that keeps
most other molecules from passing through
the membrane. The membrane consists of 2
layers of phospholipids: the lipid bilayer.
Proteins
•
•
The most important type of macromolecule.
Roles:
– Structure: collagen in skin, keratin in hair,
crystallin in eye.
– Enzymes: all metabolic transformations,
building up, rearranging, and breaking down
of organic compounds, are done by enzymes,
which are proteins.
– Transport: oxygen in the blood is carried by
hemoglobin, everything that goes in or out of
a cell (except water and a few gasses) is
carried by proteins.
– Also: nutrition (egg yolk), hormones, defense,
movement
•
•
•
•
Proteins are composed of linear chains of
amino acids.
There are 20 different kinds of amino acids in
proteins. Each one has a functional group
(the “R group”) attached to it.
Different R groups give the 20 amino acids
different properties, such as charged (+ or -),
polar, hydrophobic, etc.
The different properties of a protein come
from the arrangement of the amino acids.
Protein Structure
•
A polypeptide is one linear chain of amino acids. A
protein may contain one or more polypeptides.
Proteins also sometimes contain small helper
molecules such as heme.
– Each gene codes for one polypeptide
•
•
•
•
After the polypeptides are synthesized by the cell,
they spontaneously fold up into a characteristic
conformation which allows them to be active. The
proper shape is essential for active proteins. For
most proteins, the amino acids sequence itself is all
that is needed to get proper folding.
Proteins fold up because they form hydrogen bonds
between amino acids. The need for hydrophobic
amino acids to be away from water also plays a big
role. Similarly, the charged and polar amino acids
need to be near each other.
The joining of polypeptide subunits into a single
protein also happens spontaneously, for the same
reasons.
Enzymes are usually roughly globular, while
structural proteins are usually fiber-shaped. Proteins
that transport materials across membranes have a
long segment of hydrophobic amino acids that sits in
the hydrophobic interior of the membrane.
Nucleic Acids
• Only 2 types: DNA and RNA
• Both DNA and RNA are linear
chains of nucleotides
• DNA: 2 chains running antiparallel twisted together into
a double helix
• RNA: usually 1 chain of
nucleotides, with secondary
structure caused by base
pairing between nucleotides
on the same strand.
Nucleotides
• Each nucleotide has 3 parts: sugar, phosphate,
base.
– Sugar is ribose (RNA) or deoxyribose (DNA)
– Bases are attached to the 1’ carbon of the sugar
– Base (sometimes called “nitrogenous base”) is
purine or pyrimidine.
– Purines: 2 carbon-nitrogen rings, adenine (A) or
guanine (G)
– Pyrimidines: 1 carbon-nitrogen ring, cytosine (C),
thymine (T) (DNA only), uracil (RNA only)
• In the backbone, nucleotides are bonded
together between the phosphate on the 5'
carbon and the -OH on the 3' carbon.
– Thus each nucleic acid has a free 5' phosphate on
one end and a free 3' -OH on the other.
– Used to write the polarity of the molecule: each
nucleotide chain has a 5’ end and a 3’ end.
• DNA has -H on 2' carbon of the sugar; RNA has
-OH.
•
This difference makes DNA more stable and
allows it to form a regular double helix structure
Base Pairing
•
A bonds with T (or U); G bonds with C. Held
together by hydrogen bonds
– A-T has 2 hydrogen bonds; G-C has 3. This
makes G-C stronger and more stable at high
temperatures.
•
In DNA, 2 antiparallel chains are held
together by this pairing.
– Implies that the amount of A = amount of T,
and G = C in DNA.
– One characteristic of genomes is their GC
content: the percentage of G and C. This can
vary between from about 20% to 70%.
Eukaryotes generally have GC contents
around 40%. Also, there are large scale
variations in GC content along the length of
chromosomes called “isochores”, which may
be the result of horizontal gene transfer.
•
RNA is usually single stranded and held in a
folded conformation by base pairing within
the RNA molecule. e.g. tRNA.
Genetic Information
Processing
Central Dogma of Molecular Biology
• Concerns the flow of information in the cell.
– DNA is long term information storage
– RNA is produced from individual genes when needed by the cell
– Protein is the actual usable product of each gene
Replication
• Main enzyme: DNA polymerase. Several other enzymes also
involved (see below)
• Replication is semiconservative:
– DNA helix is opened up and unwound by a helicase
– Each old strand gets a new strand built on it.
– DNA polymerase can only add bases to the 3’ –OH group on a preexisting nucleic acid that is base-paired with the template strand it is
copying. This means that DNA synthesis starts with the enzyme primase
synthesizing a short RNA primer. DNA polymerase then adds bases to
this primer.
• DNA polymerase can only add new bases to 3' end, so one strand is
synthesized continuously (leading strand) and the other is built up of
short fragments: discontinuous synthesis on the lagging strand.
– The short (100-1000 bp ) DNA fragments, called Okazaki fragments, are
built in the opposite direction of fork movement and then ligated together
(by DNA ligase).
• In eukaryotes, the whole process starts at several points on each
chromosome and goes in both directions. Takes 8 hr to complete.
• In bacteria (which have circular chromosomes), there is a single
origin of replication, with replication proceeding in both directions and
meeting at the opposite side of the circular chromosome.
Replication
Transcription
• Transcription is making an RNA copy of a short region of DNA.
• Only part of the DNA is transcribed. A transcribed region is called a
transcription unit, which is approximately equivalent to “gene”.
– most transcription units code for proteins
– However, some code for functional RNAs that never get translated into
proteins (RNA genes).
• When transcription starts, the DNA double helix is unwound and
only one strand is used as a template for the RNA.
– the template DNA strand is called the antisense strand, and the other
DNA strand, not used in transcription is called the sense strand. This is
because the sense strand has the same base sequence as the RNA
transcript.
– Genes are oriented from 5' to 3' based on transcription direction (even
though the template DNA is read 3' to 5'). Thus, 5' end of a gene is
where transcription starts. “Upstream” and “downstream” also relate to
this direction.
• In the scientific literature, only the sense strand is written, with the 5’
end on the left.
– The antisense strand is implied.
– Sequences are written as DNA (using T) and not RNA (using U).
Transcription Process
•
•
The primary enzyme used for
transcription is RNA polymerase
RNA polymerase binds to a promoter
sequence just upstream from the
transcription start point, with the help
of several proteins called
transcription factors.
– some transcription factors are used
for all transcriptions, but others are
very specific for cell type, hormonal
stimulus, developmental time, etc.
•
RNA polymerase then moves in a 3’
direction, adding new RNA
nucleotides to the growing RNA
molecule.
– New bases are always added to the
3’ end of the growing RNA molecule
•
•
In prokaryotes, transcription ends at
a specific terminator sequence
In eukaryotes, there is no definite
transcription terminator, but the RNA
molecules are cut off at a poly-A
addition site (part of RNA processing)
Gene Regulation
•
•
•
What makes cells within an organism different from each other is which
genes are being expressed and which are not: gene regulation.
Most of the control of gene expression occurs at the point of transcription.
Transcription regulation is based on interactions between transcription
factors (proteins) and DNA sequences near the gene .
– transcription factors are trans-acting: they diffuse freely through the cell and
affect any DNA sequence they can bind to.
– in contrast, DNA sequences near the gene are cis-acting: they can only affect
transcription of the gene they are next to. (and not, for example, the same gene
on the other homologous chromosome).
•
Types of cis-acting sequence:
– promoters: several short regions within 100 bp of transcription start, especially
the TATA box, which are all similar to TATAAA.
– enhancers: can be up to several kilobases from the gene, either upstream or
downstream, and in either orientation. Increase transcription level.
– silencers: similar to enhancers, but opposite effect.
•
Regulatory sequences are short consensus sequences: imperfect variants
on a common sequence
•
Genes are also affected by the region of chromosome they are in: some
areas are highly condensed and unable to be transcribed (depending on cell
type).
RNA Processing
•
•
•
In prokaryotes, transcription and translation are essentially simultaneous:
translation of the messenger RNA starts before transcription is completed.
In eukaryotes, transcription occurs in the nucleus (where the DNA is), and
translation occurs in the cytoplasm. This de-coupling of transcription and
translation requires several steps specific to eukaryotes: RNA processing
The initial RNA molecule produced by transcription is called a primary
transcript. It is an exact copy of the DNA. Before it can be translated into
protein, it must be processed, then transported to the cytoplasm. RNA
processing has 3 steps:
1. Splicing out of introns, which are non-protein coding regions in the middle of proteincoding genes. . Most eukaryotic genes are interrupted by introns: up to 99% of the
gene in some cases. Exons are the regions of genes that code for protein. Primary
transcript contains introns, but spliceosomes (RNA/protein hybrids) splice out the
introns. There are signals on the RNA for this, but it can vary between tissues
(alternative splicing).
2. 5' cap: a 7-methyl guanine linked 5’ to 5’ with the first nucleotide of the RNA.
3. 3' poly A tail: several hundred adenosines added to 3’ end. The signal for poly A
marks end of gene, but transcription continues past this without having a definite end
point. All except histone genes have poly A. Stability of mRNA is probable reason
for it.
•
After processing, the RNA is called messenger RNA, and it gets transported to
the cytoplasm.
Intron Splicing and RNA Processing
Overview
Translation
•
•
•
•
•
After transcription, the messenger RNA molecules are translated into
polypeptides. That is, the base sequence of the mRNA is used as a code to
construct an entirely different molecule, the polypeptide.
The polypeptide is synthesized from N-terminus to C-terminus, based on
free -NH2 and -COOH groups on terminal amino acids of the polypeptide.
The polypeptide is collinear with the mRNA: the N-terminal of the
polypeptide corresponds to the 5’ end (beginning) of the mRNA. correspond
to the ribosome moving down the messenger RNA from 5’ end to 3’ end.
Translation is performed by the ribosome, a protein/RNA hybrid structure.
Each group of 3 RNA bases is a codon. Each codon codes for a specific
amino acid.
The ribosome starts translation at a start codon
–
–
–
–
•
•
•
There are untranslated regions (UTRs) at both ends of the mRNA.
Start codons are also used internally in the polypeptides.
In eukaryotes, translation starts at first AUG in the messenger RNA, goes to first stop codon.
(So, only one polypeptide per messenger RNA.)
In bacteria (but not archaea, which are like eukaryotes in this), AUG, GUG, and UUG can all
be used as start codons.
The ribosome then moves down the mRNA, adding one new amino acid for
each codon.
Translation stops when the ribosome reaches a stop codon.
Most mRNA molecules are translated multiple times.
More on Translation
•
•
•
A key actor in translation is
transfer RNA: short RNA
molecules that act as
adapters between codons on
the mRNA and the amino
acids.
The ribosome holds the
growing polypeptide chain
attached to a transfer RNA,
and it also holds a transfer
RNA carrying the next amino
acid.
At each step in the synthesis
process, the ribosome
catalyzes the transfer of the
growing polypeptide to the
next amino acid
Genetic Code
•
•
•
•
Three bases of DNA or RNA code
for 1 amino acid = codon.
Since there are 4 bases, there are
43 = 64 codons. 61 of these code
for amino acids, while the last 3
are stop codons that end the
translation process.
Most amino acids have more than
1 possible codon: code is
degenerate. Most variation is in
third position of codon.
Nearly all organisms use the same
code, with minor variations mostly
in mitochondria and chloroplasts.
– mitochondria often use a slightly
altered genetic code
•
All translations start with
methionine (N-formyl methionine
in bacteria), regardless of which
start codon is used (only AUG in
eukaryotes).
Reading Frames
•
Codons are groups of 3 bases. Since
translation can start at any nucleotide,
the same region of DNA can be read in
3 ways, starting one base apart. Each
of these 3 modes is a reading frame.
– The DNA might also be read on the
opposite strand, giving a total of 6
possible reading frames.
•
•
Genes occur in open reading frames
(ORFs), areas where there are no stop
codons. Genes end at the first stop
codon that exists in their reading frame.
3 out of every 64 codons is a stop
codon, so large open reading frames are
rare in random, unselected DNA. Since
genes are under selection pressure,
most long open reading frames contain
genes.
Protein folding
• After they have been synthesized, most
proteins fold spontaneously to the most
stable (lowest energy) configuration.
• Some proteins are assisted by
chaperone proteins, which also assist in
recovery from heat shock by causing refolding to proper configuration.
– Thus, chaperone proteins are also often
called heat shock proteins. (Actually,
these proteins were first discovered in
Drosophila as proteins synthesized in
large amounts when the flies were given
a heat shock.)
• However, predicting protein structure
from the amino acid sequence is (so far)
an unsolved and very difficult problem in
biochemistry.
Post-translational modification
• Various chemical modifications occur on many proteins:
– Glycosylation: adding sugars. occurs in smooth ER. Mostly for
proteins that are secreted or on outside of plasma membrane or
inside of lysosomes. Large blocks of sugars added. Proteins
called glycoproteins.
– Phosphorylation: adding phosphates. An important way to active
various enzymes , especially for turning genes on and off. On
serine, threonine, or tyrosine.
– Adding lipids: so proteins get anchored to membrane. Various
names depending on which lipid is added. For example,
myristoyation, prenylation, palmitoylation, etc. Proteins called
lipoproteins.
– Others as well.
• Cleavage. Often the N-terminal Met is removed. Other regions can
also be removed: middle region of insulin, removal of signal
peptides.
Localization
•
•
•
•
How do proteins get to the proper
location in the cell?
Polypeptides often contain signal
sequences that cause protein to
end up in proper organelle, or be
secreted, or become embedded
in the membrane. Often a leader
sequence (or signal sequence) at
N terminus that is then removed.
Best known is for secretion into
ER, into membrane, and
extracellular: About 20 mostly
hydrophobic amino acids at the
N-terminus of the polypeptide. A
Signal Recognition Particle
(RNA/protein hybrid) recognizes
this during translation and guides
ribosomes to the rough ER where
translation finishes.
Also signals for nucleus,
lysosome, mitochondria. Some
are internal to protein and not
removed.
A Few Odds and Ends
Operons
• In eukaryotes, each messenger RNA contains a single gene. Genes
are scattered randomly throughout the genome, with no grouping of
related genes.
– “monocistronic” = having only 1 gene on a mRNA.
• In prokaryotes, genes that make different parts of the same structure
or metabolic pathway are often grouped together and transcribed as
a single unit. Several different proteins are independently translated
from the same mRNA molecule. This group of genes is called an
operon.
– “polycistronic” = having several genes co-transcribed onto the same
mRNA.
Exceptions in Prokaryotes
• In addition to the 20 regular amino acids, two other
amino acids coded in the DNA have been found:
selenocysteine and pyrolysine. Both of these use the
UGA stop codon, with other bases around it used to
signal that it is to be interpreted as an amino acid and
not a stop.
• Bacteria have been seen (rarely) to use several other
start codons, including CTG, ATA, ATC, and ATT.
– Regardless of which start codon is used, all bacteria (NOT
Archaea) use N-formyl methionine as the first amino acid in the
polypeptide.
• “RNA editing” is a process by which certain messenger
RNAs are altered by adding, deleting, or altering certain
bases. It seems rare and (so far) confined to eukaryotes
(including mitochondria and chloroplasts).
Reverse Transcription
• A few exceptions to the Central Dogma exist.
• Most importantly, some RNA viruses, called “retroviruses” make a
DNA copy of themselves using the enzyme reverse transcriptase.
The DNA copy incorporates into one of the chromosomes and
becomes a permanent feature of the genome. The DNA copy
inserted into the genome is called a “provirus”. This represents a
flow of information from RNA to DNA.
– Closely related to retroviruses are “retrotransposons”, sequences of
DNA that make RNA copies of themselves, which then get reversetranscribed into DNA that inserts into new locations in the genome.
Unlike retroviruses, retrotransposons always remain within the cell.
They lack genes to make the protein coat that surrounds viruses.
• Some viruses use RNA for their genome, and directly copy it into
more RNA without any DNA intermediate. The enzyme involved is
called a “replicase” or “RNA dependent RNA polymerase”.