Approaches to Web Development for Bioinformatics
Codon Bias Example
In the Biology Background section I
discussed translation of codons to amino acids. The translation
is many-to-one in the sense that different codons translate to the same
amino acid. Therefore, if given a nucleotide sequence (without
introns) the protein primary structure (amino acid sequence) is
uniquely determined. On the other hand, for a given protein there
are many possible nucleotide sequences. However, not all
nucleotide sequences are equally likely.
Codon bias is a term that describes which codon to amino acid
translations are favored. This varies by organism. Here is
a program that computes the codon bias for a nucleotide sequences in a
FASTA file.
use MedicalComputing::CodonBias;
use Bio::SeqIO;
if ($#ARGV != 1) {
die "usage: codon_bias.pl file_name starting_position\n";
}
my $inputFile = $ARGV[0];
my $startingPos = $ARGV[1];
if (! -e $inputFile) {
die "Cannot find file $inputFile.\n";
}
if (! -r $inputFile) {
die "You are not authorized to read file $inputFile.\n";
}
my $inseq = Bio::SeqIO->new(-file => "<$inputFile", -format
=> "FASTA");
my $seqObj = $inseq->next_seq();
if (! $seqObj) {
die "No sequence object.\n";
}
if ($startingPos > $seqObj->length) {
die "Starting position $startingPos is greater than the length of DNA ",
$seqObj->length, ".\n";
}
my $dna = $seqObj->subseq($startingPos, $seqObj->length-1);
my $codonBias = MedicalComputing::CodonBias->new($dna);
$codonBias->tally();
print "The codon counts are:\n\tCodon\tCount\t\Percent\n";
my @aminoAcids = $codonBias->aminoAcids();
my $totalCount = 0;
my $numCodons = 0;
for my $aminoAcid (@aminoAcids) {
my $totalPercent = 0.0;
print $aminoAcid, "\n";
for my $codon ($codonBias->codonsForAminoAcid($aminoAcid)) {
if ($codonBias->codonCount($codon)) {
print "\t", $codon, "\t", $codonBias->codonCount($codon), "\t";
my $codonPercent = $codonBias->codonProportion($aminoAcid, $codon) *
100;
$totalPercent += $codonPercent;
$totalCount += $codonBias->codonCount($codon);
printf "%2.2f\n", $codonPercent;
}
else {
print "\t$codon\tNot found\n";
}
$numCodons++;
}
$totalCount += $codonBias->totalCondonCount($aminoAcid);
print "\t", "Total:", "\t", $codonBias->totalCondonCount($aminoAcid),
"\t", $totalPercent, "\n";
}
print "Total codon count for all amino acids:
$totalCount, ",
($#aminoAcids + 1), " amino acids, $numCodons unique codons.";
The program gets the sequence from the FASTA file and computes the
frequency of each codon translation. Most of the work is done by
the MedicalComputing::CodonBias
and Bio::SeqIO modules.
The MedicalComputing::CodonBias module should be placed
in a directory called MedicalComputing in a file called CodonBias.pm
in your Perl library path.
The program uses a few Perl language features not discussed before.
The die function terminates the program, giving the operating system an
error in the process. The -e $inputFile if
condition tests for the existence of the file specified on the command
line. The -r $inputFile if condition checks to see
whether the user is allowed to read the file.
The program can be run from the command line shown here. The input
file is HD.txt (the gene for Huntington
disease) and the coding region starting position is 146.
The first nucleotide is position 1.
>codon_bias.pl HD.txt 146
The codon counts are:
Codon Count Percent
Asparagine
AAT 35
38.89
AAC 55
61.11
Total: 90 100
Glycine
GGT 17
12.23
GGC 39
28.06
GGA 44
31.65
GGG 39
28.06
Total: 139 100
Proline
...
The full output listing is in codon_bias_hd.txt.
Since most of the analytic code is externalized in the modules and
most of the user interface code is in the main script the modules can
be reused in a web context. Here is a CGI script that uploads a
FASTA format file and reports the total codon biases.
#!/usr/bin/perl
# A CGI script to compute codon bias from
an uploaded a FASTA file
# Omit this line or replace myhome with the
location of your Perl libraries
use lib
qw(/myhome/perllib);
use IO::String;
use CGI;
use Bio::SeqIO;
use
MedicalComputing::CodonBias;
my $q = new CGI;
print
$q->header,
$q->start_html('Codon Bias Computation'),
$q->h1('Codon Bias Computation');
# Display the form
if (!
$q->param('uploaded_file')) {
print $q->start_multipart_form,
"Upload a FASTA
file to compute codon bias", $q->br,
$q->filefield(-name=>'uploaded_file',
-default=>'starting value ignored',
-size=>50,
-maxlength=>120),$q->p,
"Starting
position", $q->br,
$q->textfield(-name=>'startingPos',
-default=>'1',
-size=>4,
-maxlength=>4),
$q->p,
$q->submit('Submit'),
$q->end_form;
# Process the form
} else {
# upload the file
my
$filename = $q->upload('uploaded_file');
# Get starting position
my
$startingPos = 1;
if
($q->param('startingPos')) {
$startingPos =
$q->param('startingPos');
}
# Get file upload
information
my
$type = $q->uploadInfo($filename)->{'Content-Type'};
print "File $filename, starting position
$startingPos", $q->p, "\n";
# Only process text
files
if
($type eq 'text/plain') {
#
Read the text file into the string object $data (max 10 000 lines)
my $data;
my $i = 0;
while (defined($line
= <$filename>) && ($i < 10000)) {
$data .= $line;
$i++;
}
#
With an IO::String the data can be read into Bio::SeqIO
my $stringfh = new
IO::String($data);
#
Use Bio::SeqIO to parse the string
my $seqIO =
Bio::SeqIO->new(-fh => $stringfh, -format => "FASTA");
my $seqObj =
$seqIO->next_seq();
if (! $seqObj) {
print "No sequence information.", $q->p;
} else {
# Get a substring starting with the coding region
my $dna =
$seqObj->subseq($startingPos, $seqObj->length-1);
# Create a new CodonBias object
my $codonBias =
MedicalComputing::CodonBias->new($dna);
# Scan through the sequence and tally the
translations
$codonBias->tally();
print "<table border='1'>\n",
"<caption>Codon Percentages</caption>\n",
"<tr><th>Amino
Acid</th><th>Codon</th><th>Percent</th></tr>\n";
# An array that holds the names of the amino
acids translated.
my @aminoAcids =
$codonBias->aminoAcids();
# Iterate over the amino acids to display the
proportions
for my $aminoAcid
(@aminoAcids) {
my
@codonsForAminoAcid = $codonBias->codonsForAminoAcid($aminoAcid);
my $rowSpan =
$#codonsForAminoAcid + 1;
print "<tr><td rowspan='$rowSpan'>$aminoAcid</td>\n";
# Iterate over the codons translating to
each amino acid
my $codonIndex =
0;
for my $codon
(@codonsForAminoAcid) {
if ($codonIndex)
{
print "<tr>";
}
$codonIndex++;
print "<td>$codon</td><td>";
if
($codonBias->codonCount($codon)) {
my $codonPercent
= $codonBias->codonProportion($aminoAcid, $codon) * 100;
# Format the percentage bias with 2 digits
before and after the decimal point
printf "%2.2f", $codonPercent;
} else {
print "Not found";
}
print "</td></tr>\n";
}
}
print "</table>", $q->p, "\n";
}
} else
{
#
Give a warning message for other types of files than text files
print "Only text files
permitted", $q->p;
}
# Go back to the start
page
print $q->a({-href=>'codon_bias_web.pl'},
"Start again"), $q->p;
}
print $q->end_html;
A working version of the page is at codon_bias_web.
The main difference in usage of the Bio::SeqIO module is
that instead of loading the sequence directly from the FASTA file it is
uploaded, read into a string, and then used to construct an IO::String
object. The IO::String object can then be used to
construct a Bio::SeqIO object. Another version of
the web page with additional frills is at codon_bias_mc.
References
There are no user comments.
Please send ideas and opinions by email at alexamies@gmail.com.
© 2006-2007 Alex Amies