Breaking the Genetic Code

Education
When you buy through our links, we may earn a commission

Breaking the Genetic Code

Updated December 18, 2010
1 minute read

Breaking the Code

After Watson and Crick were able to identify the three-dimensional structure of DNA in 1953, it was clear that the base sequence was the primary carrier of genetic information. However, the way in which this sequence of bases specified the sequence of amino acids in a protein was not yet clear and remained elusive for the next decade.

One of the first questions that needed to be solved was how many nucleotides were necessary to specify a single amino acid. This basic unit, the set of bases that encodes for one amino acid, is now known as a codon. It was quickly recognized that codons must contain a minimum of three nucleotides. Each base position in the mRNA can be occupied by one of the four bases (A, G, C and U).

Why Three Bases?

If the codon would consist out of only one base, there would be only four different codons possible, which is not nearly enough to code for the 20 amino acids that are commonly found in proteins. If the codon would be made up out of two bases, there would be 16 (4 options for the first base x 4 options for the second) possibilities, which isn’t enough either. With three bases, there would be 64 (4x4x4) possibilities which is more than enough.

So, the most efficient way to encode all 20 amino acids is by using a triplet code, requiring 3 bases per codon.

General Characteristics of the Genetic Code

There are some general characteristics of the genetic code, that allow it to function smoothly and enable it to code for the immense diversity of proteins.

  • The genetic code consists out of a sequence of nucleotides in DNA or RNA. The four letters in the code correspond to the four bases, A, C, G and U (T in DNA).
  • It is a triplet code. Each amino acid is encoded by a sequence of three consecutive nucleotides, called a codon.
  • The code is degenerate, which means that some codons are synonymous, coding for the same amino acid. This is a result of the fact that there are 64 codons and only 20 amino acids.
  • Generally, the code is non-overlapping, each nucleotide in the sequence belongs to a single “reading frame” in the ribosome during translation.
  • The “reading frame” is set by an initiation codon, usually AUG.
  • When a “reading frame” has been set, codons are read as successive groups of three nucleotides.
  • Any one of the three termination codons (UAA, UAG and UGA) can signal the end of a protein, no amino acids are encoded by these codons.
  • This code is (almost) universal. Some different termination codons, and very rarely, some different sense (meaning coding for an amino acid) codons have been identified, mostly in mitochondrial and protozoan genes.