On this page:
This section reviews basic the basic biology background for some of the software tools discussed in this paper. It will be more useful to readers coming from a software background than those from a biology or medical background. A more full biology background is given in Purves, et al, Life: The Science of Biology9, in Griffiths, et al Introduction to Genetic Analysis28, and in Gibas and Jambeck, Developing Bioinformatics Skills10.
Deoxyribonucleic acid (DNA) is composed of nucleotides including the four nucleic acid bases adenine, guanine, cytosine, and thymine. In the production of proteins in all living organisms DNA is transcribed to ribonucleic acid (RNA), which is then translated to a protein. The sequence of nucleotides and the translation forms the basis for much theory and ongoing research in biology.
In all living organisms DNA is transcribed to RNA. There are several different kinds of RNA. Most genes code for messenger RNA (mRNA) which is then translated to protein. Some genes code for ribosomal RNA (rRNA), which form a large part of the ribosomes in eukaryotes (including humans). A small number of genes code for transfer RNA (tRNA), which assist in the translation process itself. In prokaryotes (i.e. bacteria), since they do not have a nucleus, mRNA is translated to protein nearby the transcription site. However, in eukaryotes transcription takes place in the nucleus and the product, mRNA, is transported to the cytoplasm for protein synthesis.
In eukaryotes DNA within genes contains non-coding sequences called introns. The continuous coding sequences within genes are called exons. During transcription these introns are spliced out and the exons concatenated to form the mRNA exported from the nucleus. Alternative splicings can take place making it possible to form different proteins from the same gene.
Proteins are synthesized from mRNA in molecular factories called ribosomes.
Since there are four nucleic acid bases but 20 amino acids it takes more than one base to specify an amino acid. In fact, nucleic acid bases are grouped into sets of three a codon, which translates into a single amino acid. There are 64 (43) possible combinations for codons and the translations to amino acids are given in the table below.
| DNA Base in Second Position of Codon | |||||||||||
| U (T) | C | A | G | ||||||||
| First Position | U (T) | UUU | Phenylalanine (Phe, F) | UCU | Serine (Ser, S) | UAU | Tyrosine (Tyr, Y) | UGU | Cysteine (Cys, C) | U | Third Position |
| UUC | UCC | UAC | UGC | C | |||||||
| UUA | Leucine (Leu, L) | UCA | UAA | Stop | UGA | Stop | A | ||||
| UUG | UCG | UAG | Stop | UGG | Tryptophan (Trp, W) | G | |||||
| C | CUU | Leucine (Leu, L) | CCU | Proline (Pro, P) | CAU | Histidine (His, H) | CGU | Arginine (Arg, R) | U | ||
| CUC | CCC | CAC | CGC | C | |||||||
| CUA | CCA | CAA | Glutamine (Gln, Q) | CGA | A | ||||||
| CUG | CCG | CAG | CGG | G | |||||||
| A | AUU | Isoleucine (Ile, I) | ACU | Threonine (Thr, T) | AAU | Asparagine (Asn, N) | AGU | Serine (Ser, S) | U | ||
| AUC | ACC | AAC | AGC | C | |||||||
| AUA | ACA | AAA | Lysine (Lys, K) | AGA | Arginine (Arg, R) | A | |||||
| AUG | Methionine (Met, M); Start |
ACG | AAG | AGG | G | ||||||
| G | GUU | Valine (Val, V) | GCU | Alanine (Ala, A) | GAU | Aspartic Acid (Asp, D) | GGU | Glycine (Gly, G) | U | ||
| GUC | GCC | GAC | GGC | C | |||||||
| GUA | GCA | GAA | Glutamic Acid (Glu, E) | GGA | A | ||||||
| GUG | GCG | GAG | GGG | G | |||||||
Uracil in RNA replaces thymine (T) in DNA.
This genetic code is common to nearly all organisms. Mitochondrial genomes are different, however.
In most prokaryotes, and all eukaryotes the first amino acid synthesized is methionine. Hence, in the table above methionine is listed in the same cell as the start codon. Not all occurrences of AUG represent the start of a coding sequence, however. In the ribosome an initiation complex assembles and scans the mRNA for an AUG codon that is in the proper sequence context.
Transfer RNA (tRNA) molecules recognize codons in mRNA and translate them to amino acids. Special proteins called release factors recognize stop codons and, when they do recognize one, terminate translation.
Basic translation takes the nucleotide sequence and translates it to an amino acid sequence using the table above. As an example, consider the gene SCN3A sodium channel, voltage-gated, type III, alpha protein [Homo sapiens] in the GenBank genome database2 MapViewer. This is located on chromosome 2. The first few codons starting at position 471 in the messenger RNA (mRNA) nucleotide sequence NM_006922 and its amino acid translation are
| Codon | atg | gca | cag | gca | ctg | ttg | gta | ccc | cca | gga | cct | gaa | agc | ttc | cgc | ctt | ttt | act | aga |
| Amino Acid | M | A | Q | A | L | L | V | P | P | G | P | E | S | F | R | L | F | T | R |
The sequence given by GenBank is the same as that derived from the table above. There are a couple of points to note:
This can be checked with the Swiss Institute of Bioinformatics (SIB)
ExPASy Translate Tool4
by pasting the nucliotide sequence atggcacaggcactgttggtacccccaggacctgaaagcttccgcctttttactaga
into the text area of the web form.
To do this kind of translation you need to decide what to look for and where to start. What to look for is an amino acid sequence (protein) from any of the large number of proteins in the organism being studied. The particulars of deciding what protein to look for a translation for are beyond the scope of the present document. Let's assume that someone has told us what amino acid sequence to look for.
To start a sequence we need to begin with a start codon.
However, a quick look over the SCN3A gene using the browser's find
function for the codon atg yields many matches.
Also, we need to consider the reverse direction (3′-5′ versus
5′-3′). We also don't know which nucleotide triples form the
codons. Is it atg gca ..., tgg ca...,
or ggc a...? This gives the three possible frames: Frame
1, Frame 2, and Frame 3. There are another three possible frames
in the opposite direction, giving a total of six. Translation
tools either try all frames or let users specify which frame to use in
advance.
See the page Genome Resources on this site for a list of popular and user suggested genome links for additional information on this topic.
There are no user comments.
Please send ideas and opinions by email at alexamies@gmail.com.