Approaches to Web Development for Bioinformatics
Human Genome
The human genome is given in a number of different files and
formats. The messenger RNA (mRNA) is given in a file called rna.gbk.gz.
This includes each locus with reference identifiers, PubMed references,
mRNA, a comment, and the gene product (usually a protein). Here
are a few sample lines
GenBank mRNA file fragment
LOCUS
NM_004239
6452 bp mRNA linear PRI
16-OCT-2005
DEFINITION Homo sapiens thyroid hormone receptor interactor 11
(TRIP11), mRNA.
...
COMMENT PROVISIONAL REFSEQ: This record has not
yet been subject to final
NCBI
review. The reference sequence was derived from Y12490.1.
Summary: TRIP11 was first identified through its ability to
interact functionally with thyroid hormone receptor-beta (THRB; MIM
190160). It has also been found in association with the Golgi
apparatus and microtubules.[supplied by OMIM].
FEATURES
Location/Qualifiers
...
/db_xref="GeneID:9321"
A program that scans this file and extracts the gene ID, locus
accession reference, and comment to another file is given below.
Perl
my $i =
0;
my $accession;
my $comment;
open RNA_GBK,
"<",
"rna.gbk";
# The output file
open COMMENTS_RNA,
">",
"comments_rna.gbk";
# Iterate over each line in the input file
while (<RNA_GBK>) {
if ($_ =~ /^(LOCUS)(\s+)([A-Z0-9_]+)/) {
$accession = $3;
$i++;
}
elsif ($_ =~ /^(COMMENT)(\s+)(.*)/) {
$comment = $3;
while (<RNA_GBK>) {
if ($_ =~ /^(\s+)(.*)/) {
$comment .= " $2";
}
else {
last;
}
}
}
if ($_ =~ /^FEATURES/) {
my $geneID;
while (<RNA_GBK>) {
if ($_ =~ /(GeneID:)([0-9]+)/) {
$geneID .= $2;
print(COMMENTS_RNA
"$geneID\t$accession\t$comment\n");
last;
}
}
}
}
print
"$i Locii written.\n";
close RNA_GBK;
close COMMENTS_RNA;
The comment can then be more easily searched and extracted with a simple script, such as
Perl
open COMMENTS_RNA,
"<",
"comments_rna.gbk";
my @line = grep {/^9321/} <COMMENTS_RNA>;
my @fields = split /\t/, $line[0];
print $fields[2];
close COMMENTS_RNA;
This is used in conjuntion with the scripts above to produce the
detailed comments for the human
gene search page on this site.
There are no user comments.
Please send ideas and opinions by email at alexamies@gmail.com.
© 2006-2007 Alex Amies