Transcript Document

Recap





DNA
RNA
4 bases
base pairing/double helix
Central Dogma of Molecular Biology
Questions for you

Which one is a longer sequence:
DNA or RNA?

What does RNA do exactly?

What the difference between:
transcription and translation?
Questions for you

What are the four bases of DNA?

What are the four bases of RNA?

Could you draw a detailed picture of the
double helix?
DNA
Sugar
A
T
Sugar
Acid
Sugar
Acid
T
A
Sugar
Acid
Sugar
Acid
G
C
Acid
Sugar
Acid
Sugar
Acid
A
T
Sugar
Acid
DNA
Double Helix
More questions for you

Which bases go together?
–
–


TA
CG
Just remember T & A
What does T & A stand for anyway?
–
–
Thymine
Adenine
Recap
Structure and Function of Genes

Genetic information is stored in DNA, and the
expression of this information requires several steps
that flow in one direction:
Genes

Genes are segments of DNA encoding
information that ultimately direct the
production of RNA molecules that serve a
variety of functions, including:
Genes
1.
dictating the synthesis of proteins that
perform a wide variety of functions in the
body,
Genes
2.
regulating (turning on or turning off) the
expression of other genes,
Genes
3.
forming structures in the cell—ribosomes—
that are critical for the manufacture of
proteins, and
Genes
4.
transporting amino acids—the building
blocks of proteins—to the ribosomes for the
creation of proteins.
Human Genome Project



has confirmed that human DNA contains a little over
3 billion bases
99% of them are the same in all people
In February 2001, the first major goal of the Human
Genome Project
–
1.
2.
a detailed working draft of the sequence of human DNA—
was published simultaneously in the journals
Nature (Lander ES et al: Nature 409:860-921, 2001) and
Science (Venter JC et al: Science 291:1304-1351, 2001).
Features of DNA



it offers a means of storing and coding vast amounts
of information captured by the sequence of bases
present in the DNA strand;
humans have about 3,000,000,000 in their genome
(the complete set of genetic information);
the complementary structure allows for the faithful
replication of DNA as cells divide, with one strand
serving as a template for the synthesis of the other;
Features of DNA




a mechanism for preventing loss of information is
built into the structure
a base that is lost or altered on one strand can be
replaced using the complementary strand to direct its
repair; and
the complementarity of DNA allows strands to find
each other in a complex mixture of molecules;
this is termed "reannealing" or "hybridization".
Transcription



entails the synthesis of a single-stranded
polynucleotide of RNA at an unwound section of
DNA with one of the DNA strands serving as a
template for the synthesis of the RNA.
The product of this process is called an RNA
transcript, or messenger RNA (mRNA).
The result of transcription is that the genetic
information encoded in DNA is transferred to RNA;
this occurs in the nucleus of the cell.
Translation



follows the movement of mRNA to the
cytoplasm where it interacts with structures
called ribosomes to synthesize a protein.
Proteins are a linear sequence of amino
acids, each of which is specified by the
sequence of nucleotides in the RNA
molecule
(which, in turn, was specified by the DNA
where it was synthesized).
Protein Encoding




Genetic information is encoded in a sequence of
three nucleotides termed codons.
The four nucleotides of RNA are adenine(A),
guanine (G), cytosine(C), and uracil (U), which
replaces thymine (T) in the DNA template.
These four nucleotides can be arranged in various
combinations to form 64 codons,
each containing three letters (4 × 4 × 4 = 64).
Protein Encoding


Since there are 20 amino acids that nature
draws on to create proteins,
there are more than enough codons in the
genetic code to specify the 20 amino acids
used in proteins.
Gene Structure





The number of genes in the human genome
is estimated to be about 35,000, to 40,000—
considerably fewer than once thought—
80,000-100,000
But I think there really not sure yet
dispersed throughout the set of
chromosomes.
Gene Structure



Although the average gene is about 3,000
bases long,
the smallest genes may be just a few
hundred base pairs;
the largest is over two million base pairs in
length.
Famous people named Gene





Gene Simmons
Gene Kelly
Gene Roddenberry
Gene Hackman
Gena Lee Nolin
–

close enough, right?
Gene “The Hunk”
Gene Structure




Human genes, like most genes in
multicellular organisms (eukaryotes), contain
introns—stretches of DNA located within the
gene that are transcribed into RNA
and then spliced out before the RNA is
translated into protein (see diagram).
These stretches of DNA have no discernible
coding functions.
Gene Structure



However, it also appears that splicing may
occur at various alternative points along the
DNA molecule,
allowing for differing proteins to be
constructed from what might otherwise
appear to be a single "gene.“
Cool, right?
Gene Structure


Once mRNA is transcribed from a gene, it goes
through several processing steps in the nucleus
before being translated in the cytoplasm.
This "processing" involves:
1.
2.
3.
4.
the addition of a modified guanine molecule to the 5’ end
(called capping),
the addition of a "tail" comprised of a series of adenine
bases (called a poly-A tail),
excision of the introns, and
splicing of the exons back together.
Let take a break and
look at some good genes

Gena Lee

Gene “The Hunk”
Protein Sequences




Proteins are macromolecules (heteropolymers)
made up from 20 different amino acids, also referred
to as residues.
A certain number of residues is necessary to perform
a particular biochemical function
around 40-50 residues appears to be the lower limit
for a functional domain size.
Protein sizes range from this lower limit to several
hundred residues in multi-functional proteins.
Protein Sequences



Amino acids
The basic structure of
an a-amino acid is quite
simple.
R denotes any one of
the 20 possible side
chains (see table).
Name
3-letter
code
Single
code
Relative abundance
(%) E.C.
MW
VdW
volume(Å3)
Charged, Polar,
Hydrophobic
Alanine
ALA
A
13.0
71
67
H
Arginine
ARG
R
5.3
157
148
C+
Asparagine
ASN
N
9.9
114
96
P
Aspartate
ASP
D
9.9
114
91
C-
Cysteine
CYS
C
1.8
103
86
P
Glutamate
GLU
E
10.8
128
109
C-
Glutamine
GLN
Q
10.8
128
114
P
Glycine
GLY
G
7.8
57
48
-
Histidine
HIS
H
0.7
137
118
P,C+
Isoleucine
ILE
I
4.4
113
124
H
Leucine
LEU
L
7.8
113
124
H
Lysine
LYS
K
7.0
129
135
C+
Methionine
MET
M
3.8
131
124
H
Phenylalanine
PHE
F
3.3
147
135
H
Proline
PRO
P
4.6
97
90
H
Serine
SER
S
6.0
87
73
P
Threonine
THR
T
4.6
101
93
P
Tryptophan
TRP
W
1.0
186
163
P
Tyrosine
TYR
Y
2.2
163
141
P
Valine
VAL
V
6.0
99
105
H
pK
12.5
3.9
4.3
6.0
10.5
10.1
Protein Sequences




The polypeptide chain
Two amino acids are combined in a
condensation reaction.
The sequence of the different amino acids is
considered the primary structure of the
peptide or protein.
Counting of residues always starts at the Nterminal end (NH2-group).
Protein Sequences
First residue
Start of the
protein
(NH2group).
Second
residue
Protein Sequences





Primary structure
It’s the sequence of residues
GLARENLQKNEDMFNPGICH
Sometimes real proteins spell a lot of funny
things
Like
–
–
GIMP
WHY
Protein Sequences



Bond angles
In contrast to the rather
rigid peptide bond
angle  (always close
to 180 deg)
the bond angles phi 
and psi  can have a
certain range of
possible values
Secondary structure elements





The polypeptide chain of a protein seldom forms just
a random coil.
Proteins have either a chemical (enzymes) or
structural function to fulfill.
High specificity requires an intricate arrangement of
3-dimensional interactions
therefore a defined conformation of the polypeptide
chain.
In fact, some neurodegenerative diseases like
Huntington's may be related to random coil formation
in certain proteins.
Secondary structure elements

1.
2.

The two most common secondary structure
arrangements are
the right-handed a-helix and
the b-sheet,
which can be connected into a larger tertiary
structure (or fold)
–

by turns and loops of a variety of types.
The b-sheets can be formed by parallel or, most
common, antiparallel arrangement of individual bstrands.
Secondary
structure

shows the hydrogen
bonding in an actual ahelix backbone
Secondary
structure



Electron density is used
to show an even nicer
picture.
Its hard to see
But this is really an
alpha helix
Secondary
structure




This is a b-sheet
The chain turns then
attaches back to itself
The chain can do this over
and over again
Forming a woven sheet
Homework & Upcoming Stuff




Read 22-44.
Project #1 will be given out next Tuesday.
I’ll explain it in a second
Pop-quiz is coming up.
Project #1







I’m collecting a list of the top 15 research papers in
bioinformatics.
Pick a paper
Read it
Talk with me about it
Try to understand it
Summarize it in a 2 page paper with at least one
diagram and one chart/graph/table.
Prepare a 20 minute PowerPoint or Visual
demonstration about the paper.
Project #1





Presentation 50% (graded 0-100)
Paper 50% (graded 0-100)
Presentations will start two weeks from next
Tuesday.
First person to go gets +6
Second person to go get +3
Project #1






The hard part:
Not only do I want a summary but I want to know
The immediate impact: What exactly was made
possible through this work?
Broader impact: Indirectly, how is the world a better
place because of this work? Did it lead to curing of a
disease, etc.
This will take research beyond just reading the
paper.
In fact, you might have to even read other cited
papers.