Molecular Genetics |
![]() |
Genetics is the study of heredity (from the Latin genesis = birth). The big question to be answered is: why do organisms look almost, but not exactly, like their parents? There are three branches of modern genetics:
Molecular Genetics (or Molecular Biology), which is the study of heredity at the molecular level, and so is mainly concerned with the molecule DNA. It also includes genetic engineering and cloning, and is very trendy. This module is mostly about molecular genetics.
Classical
or Mendelian Genetics, which is the study of heredity at the whole
organisms level by looking at how characteristics are inherited. This method
was pioneered by Gregor Mendel (1822-1884). It is less fashionable today
than molecular genetics, but still has a lot to tell us. This is covered in
Module 5.
Population
Genetics, which is the study of genetic differences within and between
species, including how species evolve by natural selection. Some of this is
also covered in Module 5.
DNA
and its close relative RNA are perhaps the most important molecules in biology.
They contain the instructions that make every single living organism on the
planet, and yet it is only in the past 50 years that we have begun to understand
them. DNA stands for deoxyribonucleic acid and RNA for ribonucleic
acid, and they are called nucleic acids because they are weak acids,
first found in the nuclei of cells. They are polymers, composed of monomers
called nucleotides.
Nucleotides
have three parts to them:
a
phosphoric acid
a deoxyribose (5-carbon or pentose sugar). By
convention the carbon atoms are numbered as shown to distinguish them from
the carbon atoms in the base. If carbon 2 has a hydroxyl group (OH) attached
then the sugar is ribose, found in RNA.
a
nitrogenous base. There are five different organic bases, but they
all contain the elements carbon, hydrogen, oxygen and nitrogen. They fall into groups, purines (two rings of carbon
and nitrogen atoms) and pyrimidines (a single ring of carbon and
nitrogen atoms). The base thymine
is found in DNA only and the base uracil
is found in RNA only, so there are only four different bases present at a
time in one nucleic acid molecule.
Base:
|
Adenine (A) |
Cytosine (C) |
Guanine (G) |
Thymine (T) |
Uracil (U) |
Nucleotides can join together by a condensation
reaction (results in the removal of water) between the phosphate group of
one nucleotide and the hydroxyl group on carbon 3 of the sugar of the other
nucleotide. The bonds linking the
nucleotides together are strong, covalent phosphodiester bonds.
The
bases do not take part in the polymerisation, so there is a sugar-phosphate
backbone with the bases extending off it. This means that the nucleotides
can join together in any order along the chain. Many nucleotides form a polynucleotide.
Each
polynucleotide chain has two distinct ends
The three-dimensional structure of DNA was discovered in the 1950's by
Watson and Crick. The main features of the structure are:
DNA
is double-stranded, so there are two polynucleotide stands alongside
each other. The strands are antiparallel, i.e. they run in opposite
directions (5'à
3’ and 3’à5’)
The
two strands are wound round each other to form a double helix.
The
two strands are joined together by hydrogen bonds between the bases.
The bases therefore form base pairs, which are like rungs of a
ladder.
The
base pairs are specific. A only binds to T (and T with A), and C only binds
to G (and G with C). These are called complementary base pairs. This
means that whatever the sequence of bases along one strand, the sequence of
bases on the other strand must be complementary to it. (Incidentally, complementary,
which means matching, is different from complimentary, which means
being nice.)
DNA
is the genetic material, and genes are made of DNA. DNA therefore has two
essential functions: replication and expression.
·
Replication means
that the DNA, with all its genes, must be copied every time a cell divides.
·
Expression means
that the genes on DNA must control characteristics. A gene was traditionally
defined as a factor that controls a particular characteristic (such as flower
colour), but a much more precise definition is that a gene is a section of
DNA that codes for a particular protein. Characteristics are controlled by
genes through the proteins they code for, like this:
Expression
can be split into two parts: transcription (making RNA) and translation
(making proteins).
No
one knows exactly how many genes we humans have to control all our
characteristics, the latest estimates are 60-80,000. The sum total of all the
genes in an organism is called the genome.
The
table shows the estimated number of genes in different organisms:
Species |
Common
name |
length
of DNA (kbp)* |
no
of genes |
phage |
virus |
48 |
60 |
Escherichia
coli |
Bacterium |
4 639 |
7 000 |
Saccharomyces
cerevisiae |
Yeast |
13 500 |
6 000 |
Drosophila
melanogaster |
fruit
fly |
165 000 |
~10 000 |
Homo
sapiens |
Human |
3 150 000 |
~70 000 |
*kbp = kilo base pairs, i.e. thousands of nucleotide monomers.
Amazingly,
genes only seem to comprise about 2% of the DNA in a cell. The majority of the
DNA does not form genes and doesn’t seem to do anything. The purpose of this junk
DNA remains a mystery!
RNA
is a nucleic acid like DNA, but with 4 differences:
RNA
has the sugar ribose instead of deoxyribose
RNA
has the base uracil instead of thymine
RNA
is usually single stranded
RNA
is usually shorter than DNA
mRNA carries the "message" that codes for a particular protein from the nucleus (where the DNA master copy is) to the cytoplasm (where proteins are synthesised). It is single stranded and just long enough to contain one gene only. It has a short lifetime and is degraded soon after it is used.
rRNA,
together with proteins, form ribosomes, which are the site of mRNA translation
and protein synthesis. Ribosomes have two subunits, small and large, and are
assembled in the nucleolus of the nucleus and exported into the
cytoplasm.
tRNA is an “adapter” that matches amino acids to their codon. tRNA is
only about 80 nucleotides long, and it folds up by complementary base pairing to
form a looped clover-leaf structure. At one end of the molecule there is always
the base sequence ACC, where the amino acid binds. On the middle loop there is a
triplet nucleotide sequence called the anticodon. There are 64 different
tRNA molecules, each with a different anticodon sequence complementary to the 64
different codons. The amino acids are attached to their tRNA molecule by
specific enzymes. These are highly specific, so that each amino acid is attached
to a tRNA adapter with the appropriate anticodon.
DNA is copied, or replicated, before every cell division, so that one identical copy can go to each daughter cell. The method of DNA replication is obvious from its structure: the double helix unzips and two new strands are built up by complementary base-pairing onto the two old strands.
Replication
starts at a specific sequence on the DNA molecule called the replication
origin.
An
enzyme unwinds and unzips DNA, breaking the hydrogen bonds that join the
base pairs, and forming two separate strands.
The
new DNA is built up from the four nucleotides (A, C, G and T) that are
abundant in the nucleoplasm.
These
nucleotides attach themselves to the bases on the old strands by
complementary base pairing. Where there is a T base, only an A nucleotide
will bind, and so on.
The
enzyme DNA polymerase joins the new nucleotides to each other by
strong covalent bonds, forming the sugar-phosphate backbone.
A
winding enzyme winds the new strands up to form double helices.
The
two new molecules are identical to the old molecule.
DNA
replication can take a few hours, and in fact this limits the speed of cell
division. One reason bacteria can reproduce so fast is that they have a
relatively small amount of DNA.
The
Meselson-Stahl Experiment
This
replication mechanism is sometimes called semi-conservative replication,
because each new DNA molecule contains one new strand and one old strand. This
need not be the case, and alternative theories suggested that a
"photocopy" of the original DNA could be made, leaving the original
DNA conserved (conservative replication). The evidence for the
semi-conservative method came from an elegant experiment performed in 1958 by
Meselson and Stahl. They used the bacterium E.
coli together with the technique of density gradient centrifugation,
which separates molecules on the basis of their density.
1.
Grow
bacteria on medium with normal 14NH4 |
|
|
These
first two steps are a calibration.
They show that the method can distinguish between DNA containing 14N
and that containing 15N. |
2.
Grow
bacteria for many generations on medium with 15NH4 |
|
|
|
3.
Return
to 14NH4 medium for 20 minutes (one generation) |
|
|
This
is the crucial step. The
DNA has replicated just once in 14N medium.
The resulting DNA is not heavy or light, but exactly half way
between the two. Thus rules
out conservative replication. |
4.
Grow
on 14NH4 medium for 40 mins (two generations) |
|
|
After
two generations the DNA is either light or half-and-half. This rules out
dispersive replication. The
results are all explained by semi-conservative replication. |
The
sequence of bases on DNA codes for the sequence of amino acids in proteins. But
there are 20 different amino acids and only 4 different bases, so the bases are
read in groups of 3. This gives 43 or 64 combinations, more than
enough to code for 20 amino acids. A group of three bases coding for an amino
acid is called a codon, and the meaning of each of the 64 codons is
called the genetic code.
|
SECOND BASE |
|
|||||||||
U |
C |
A |
G |
||||||||
F I R S T
B A S E (5'end) |
U |
UUU |
Phe |
UCU |
Ser |
UAU |
Tyr |
UGU |
Cys |
U |
T H I R D
B A S E
(3'end) |
UUC |
UCC |
UAC |
UGC |
C |
|||||||
UUA |
Leu |
UCA |
Ser |
UAA |
Stop |
UGA |
Stop |
A |
|||
UUG |
UCG |
UAG |
UGG |
Trp |
G |
||||||
C |
CUU |
Leu |
CCU |
Pro |
CAU |
His |
CGU |
Arg |
U |
||
CUC |
CCC |
CAC |
CGC |
C |
|||||||
CUA |
Leu |
CCA |
Pro |
CAA |
Gln |
CGA |
Arg |
A |
|||
CUG |
CCG |
CAG |
CGG |
G |
|||||||
A |
AUU |
Ile |
ACU |
Thr |
AAU |
Asn |
AGU |
Ser |
U |
||
AUC |
ACC |
AAC |
AGC |
C |
|||||||
AUA |
Ile |
ACA |
Thr |
AAA |
Lys |
AGA |
Arg |
A |
|||
AUG |
Met |
ACG |
AAG |
AGG |
G |
||||||
G |
GUU |
Val |
GCU |
Ala |
GAU |
Asp |
GGU |
Gly |
U |
||
GUC |
GCC |
GAC |
GGC |
C |
|||||||
GUA |
Val |
GCA |
Ala |
GAA |
Glu |
GGA |
Gly |
A |
|||
GUG |
GCG |
GAG |
GGG |
G |
|||||||
*** Note that this table represents bases in mRNA.
There are some tables that may only show the DNA code |
There
are several interesting points from this triplet code:
It
is a linear code i.e. the code is only read in one direction (5’à3’)
along the mRNA molecule
The
code is degenerate i.e. there is often more than one codon for an
amino acid i.e. there are more base combinations than there are amino acids.
This means that several base sequences may code for the same amino
acid. E.g. CCA, CCC, CCG and CCT all code for the same amino acid: proline.
The first two bases of the code are more important than the third
base in specifying a particular amino acid
The
code is non-overlapping, i.e. each triplet in DNA specifies one amino
acid. Each base is part of only one triplet, and is therefore
involved in specifying only one amino acid.
At
the start and end of a sequence there are punctuation codes i.e.
there is a ‘start’ signal given by AUG (codes for methionine) and there
are three ‘stop’ signals (UUA, UAG and UGA).
The three stop signals do not code for an amino acid.
It
is a universal code i.e. the same base sequence always codes for the
same amino acid, regardless of the species
DNA
never leaves the nucleus, but proteins are synthesised in the cytoplasm, so a
copy of each gene is made to carry the “message” from the nucleus to the
cytoplasm. This copy is mRNA, and the process of copying is called
transcription.
The
start of each gene on DNA is marked by a special sequence of bases.
The
RNA molecule is built up from the four ribose nucleotides (A, C, G and U) in
the nucleoplasm. The nucleotides attach themselves to the bases on the DNA
by complementary base pairing, just as in DNA replication. However, only one
strand of RNA is made. The DNA stand that is copied is called the template
or sense strand because it contains the sequence of bases that codes
for a protein. The other strand is just a complementary copy, and is called
the non-template or antisense strand.
The
new nucleotides are joined to each other by strong covalent bonds by the
enzyme RNA polymerase.
Only
about 8 base pairs remain attached at a time, since the mRNA molecule peels
off from the DNA as it is made. A winding enzyme rewinds the DNA.
The
initial mRNA, or primary transcript, contains many regions that are
not needed as part of the protein code. These are called introns (for
interruption sequences), while the parts that are needed are called exons
(for expressed sequences). All eukaryotic genes have introns, and they are
usually longer than the exons.
The
introns are cut out and the exons are spliced together by enzymes
The
result is a shorter mature RNA containing only exons. The introns are
broken down.
The
mRNA diffuses out of the nucleus through a nuclear pore into the
cytoplasm.
1.
A ribosome attaches to the mRNA at an initiation codon (AUG). The
ribosome encloses two codons. |
|
2. met-tRNA
diffuses to the ribosome and attaches to the mRNA initiation codon by
complementary base pairing. |
|
3. The
next amino acid-tRNA attaches to the adjacent mRNA codon (leu in this
case). |
|
4. The
bond between the amino acid and the tRNA is cut and a peptide bond
is formed between the two amino acids. |
|
5. The
ribosome moves along one codon so that a new amino acid-tRNA can attach.
The free tRNA molecule leaves to collect another amino acid. The cycle
repeats from step 3. |
|
6. The
polypeptide chain elongates one amino acid at a time, and peels away from
the ribosome, folding up into a protein as it goes. This continues for
hundreds of amino acids until a stop codon is reached, when the ribosome
falls apart, releasing the finished protein. |
|
A
single piece of mRNA can be translated by many ribosomes simultaneously, so many
protein molecules can be made from one mRNA molecule. A group of ribosomes all
attached to one piece of mRNA is called a polysome.
Post-Translational Modification [back to top]
In
eukaryotes, proteins often need to be altered before they become fully
functional. Modifications are carried out by other enzymes and include: chain
cutting, adding methyl or phosphate groups to amino acids, or adding sugars (to
make glycoproteins) or lipids (to make lipoporteins).
Last updated 20/06/2004