Approaches to Web Development for Bioinformatics

Previous  Contents  Next
References

BioPerl

In this section

BioPerl15 is an open source project that has a number of bioinformatics utilities.  These include reading various bioinformatics file formats from NCBI, EMBL, and other organizations.

Basics

The BioPerl distribution can be downloaded from the BioPerl web site or installed from the web.  To install it on windows with ActivePerl start up the ActivePerl Perl Package Manager from the start menu and enter the commands


ppm> repository add BioPerl http://bioperl.org/DIST/BioPerl-1.4.ppd
ppm> install BioPerl

On UNIX use the commands


>perl Makefile.PL
>make
>make install

To test the distribution enter the command


>perl -I. -w t/SeqIO.t

See the INSTALL text file included with the distribution for other options.

Documentation is on the BioPerl web site or type


>perldoc Bio::SeqIO

where Bio::SeqIO should be replaced with the name of the module you need documentation on.

Here is one of the simplest BioPerl programs possible. It imports the module Bio::SeqIO, uses it to read and parse a FASTA file called HD.txt, and prints out the length of the sequence.


#!/bin/perl -w
# Example BioPerl script that reads in the FASTA file HD.txt and prints out the length

use Bio::SeqIO;

my $inseq = Bio::SeqIO->new(-file => "<HD.txt", -format => "FASTA");
while (my $seq = $inseq->next_seq()) {
print $seq->length,"\n";
}

If you want to use BioPerl on a web server that you do not have administrative access to copy all the .pm files to a directory somewhere, say /home/yourname/lib.  A use lib pragma at the top of your script lets the Perl runtime know where to find the BioPerl modules. Here is a CGI script to test that out.


#!/usr/bin/perl

use lib qw(/home/yourname/lib);
use Bio::SeqIO;

print "Content-type: text/html\n\n";
print "BioPerl output:\n";

my $inseq = Bio::SeqIO->new(-file => "<HD.txt", -format => "FASTA");
while (my $seq = $inseq->next_seq()) {
print $seq->length,"\n";
}

Here is another command line program that iterates over the file NCBI genbank format file rna.gbk2 and prints out the ID and description of each locus. Due to the large amount of output I suggest directing the output of this script to another file.


#!/bin/perl -w
# Example BioPerl script that reads the genbank file rna.gbk from the NCBI site and iterates over all the locii in it

use Bio::SeqIO;

my $in = Bio::SeqIO->new(-file => "<rna.gbk" , '-format' => 'genbank');
while ( my $seq = $in->next_seq() ) {
print $seq->id,"\t",$seq->desc(),"\n";
}


Previous  Contents  Next
References


Contributed Comments and NotesAdd a comment.

There are no user comments.

Google

Please send ideas and opinions by email at alexamies@gmail.com.

© 2006-2007 Alex Amies