Approaches to Web Development for Bioinformatics
In this section I will give an example of processing DNA sequence data from a FASTA file in a
web application. The example was inspired by the GCContent program in the BioJava tutorial.
It shows that it is easy to adapt a command line example to a web environment. All you
have to do is to add on the web user interface. The example also demonstrates use of the
Apache Commons FileUpload component60
to upload the FASTA format file.
The example starts with a JSP file to display a form prompting the user to upload a FASTA file.
<%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
</head>
<body>
<h1>FASTA File Processing Example</h1>
<%
String errorMessage =
(String)request.getAttribute("errorMessage");
Double gcContent = (Double)request.getAttribute("GCContent");
if (errorMessage != null) {
%>
<p><%=errorMessage%></p>
<p>Upload another file:</p>
<%
}
else if (gcContent !=
null) {
java.text.NumberFormat nf = java.text.NumberFormat.getInstance();
nf.setMaximumFractionDigits(0);
String gcString = nf.format(gcContent.doubleValue());
%>
<p>The GC Content of your file is <%=gcString%> percent.
<%=request.getAttribute("Annotation")%></p>
<p>Upload another file:</p>
<%
} else {
%>
<p>Please upload a FASTA file to find out the GC content:</p>
<%
}
%>
<form method="post" action="gccontent" enctype="multipart/form-data">
<div>
<input type="file" name="uploaded_file"/><br/><br/>
<input type="submit" name="Upload" value="Upload" />
</div>
</form>
</body>
</html>
Before displaying the form the JSP checks to see if there is already some information in the
HTTP request object. This is data that the Servlet below will forward to the JSP. The
first possible piece of data is an error message. This may be generated when processing an
uploaded file. The next possible piece of data is the GC (guanine-cytosine base pair)
content of the DNA sequence. If a number is present then it is formatted with the NumberFormat
class to prevent display of too many decimal digits. The third possible piece of information
is a description of the sequence uploaded.
The JSP page has an enctype of multipart/form-data. This indicates that it will
contain more than just a request for a new page: an additional 'part' will be appended to
the HTTP request as well. The action of the form is important. It is the relative URL of
the Servlet that will process the request. The only field in the form is the upload widget
of type "file". Here is what the page looks like when it is first displayed.
Screenshot when FASTA File Upload Page is First Displayed
Here is the Servlet that processes the request. The code is in file
GCContentServlet.java.
package net.medicalcomputing.web;
import java.io.*;
import java.util.*;
import java.util.logging.*;
import javax.servlet.*;
import javax.servlet.http.*;
import org.apache.commons.fileupload.FileItem;
import org.apache.commons.fileupload.FileItemFactory;
import org.apache.commons.fileupload.FileUploadException;
import org.apache.commons.fileupload.disk.DiskFileItemFactory;
import org.apache.commons.fileupload.servlet.ServletFileUpload;
import org.biojava.bio.BioException;
import org.biojava.bio.seq.DNATools;
import org.biojava.bio.symbol.Symbol;
import org.biojavax.bio.seq.RichSequence;
import org.biojavax.bio.seq.RichSequenceIterator;
public class GCContentServlet
extends
HttpServlet
implements Servlet {
protected void doPost(HttpServletRequest request,
HttpServletResponse response)
throws ServletException, IOException {
String errorMessage =
null;
FileItemFactory factory =
new DiskFileItemFactory();
ServletFileUpload upload =
new ServletFileUpload(factory);
try {
List items = upload.parseRequest(request);
Iterator it = items.iterator();
FileItem item = (FileItem)it.next();
if (!item.isFormField()) {
Logger.getLogger(
"net.medicalcomputing").log(Level.INFO,
"Uploaded file info:" +
"\nField Name:\t" + item.getFieldName() +
"\nFile Name:\t" + item.getName() +
"\nContent Type:\t" + item.getContentType() +
"\nIs In Memory:\t" + item.isInMemory() +
"\nSize (Bytes):\t" + item.getSize() +
"\n");
InputStream uploadedStream = item.getInputStream();
BufferedReader br =
new
BufferedReader(
new InputStreamReader(uploadedStream));
RichSequenceIterator sequence =
org.biojavax.bio.seq.RichSequence.IOTools.readFastaDNA(br, null);
if (sequence.hasNext()) {
RichSequence seq = sequence.nextRichSequence();
int gc =
0;
for (
int
pos =
1; pos <= seq.length(); ++pos) {
Symbol sym = seq.symbolAt(pos);
if (sym == DNATools.g() || sym == DNATools.c())
++gc;
}
double gcContent =
(gc *
100.0) / seq.length();
request.setAttribute(
"GCContent",
new Double(gcContent));
String annotation =
"";
if (seq.getAnnotation() !=
null) {
annotation = "Information about your sequence: " +
seq.getDescription();
}
request.setAttribute(
"Annotation", annotation);
}
}
}
catch (FileUploadException e) {
errorMessage = "Error uploading file: " + e.getMessage();
Logger.getLogger("net.medicalcomputing").log(Level.WARNING,
e.getMessage(), e);
}
catch (NoSuchElementException e) {
errorMessage = "Error processesing file: " + e.getMessage();
Logger.getLogger("net.medicalcomputing").log(Level.WARNING,
e.getMessage(), e);
}
catch (BioException e) {
errorMessage = "Error processesing file: " + e.getMessage();
Logger.getLogger("net.medicalcomputing").log(Level.WARNING,
e.getMessage(), e);
}
request.setAttribute(
"errorMessage", errorMessage);
getServletContext().getRequestDispatcher(
"/index.jsp").forward(request,
response);
}
}
After importing the relevant java and Servlet libraries the Servlet imports the Apache Commons
FileUpload libraries and finally the BioJava libraries. Using * as a wildcard on import statements
can save space but it is a good practice to write out the full import for each class for
clarity to help people who have to read your code.
The Servlet use the DiskFileItemFactory to help in uploading the file. This will save a temporary
copy of the uploaded file in order to prevent the application server to have to hold the entire
file in memory. After getting a reference to the uploaded FileItem. The Servlet
logs some information out about the file being uploaded. This information is logged to the
application server log files.
The critical lines of the application are
InputStream uploadedStream = item.getInputStream();
BufferedReader br = new
BufferedReader(new InputStreamReader(uploadedStream));
RichSequenceIterator sequence = org.biojavax.bio.seq.RichSequence.IOTools.readFastaDNA(br,
null);
This is where an InputStream is obtained from the uploaded file and that is fed into the BioJava
RichSequenceIterator via a BufferedReader. Having connected the uploaded file
to the BioJava API we can then use the later to extract information from the DNA sequence. The
GC content is computed and the description in the FASTA file is extracted.
The information extracted from the sequence is then placed in the request object and the request is
forwarded back to the Java Server Page index.jsp to display the page using the
RequestDispatcher. The Java code has been somewhat separated from the HTML code.
There are frameworks to reduce this mixing further but they can be a task to learn in themselves.
Here is the screen after uploading the file HD.txt.
Screenshot when FASTA File has been Uploaded
You will need several libraries to compile the code:
- servlet-api.jar - the Servlet API's
- biojava-1.5-beta2.jar - or later version of the BioJava API
- bytecode.jar - BioJava bytecode utilities
- commons-fileupload-1.2.jar - or later version the Apache FileUpload API
- commons-io-1.3.1.jar - or later version the Apache File IO utilities API
The web application descriptor file web.xml also declares a listener
to clean up the temporary files left from uploading. This is part of the Apache FileUpload
component. If you have trouble putting together the web application you can always use
my web application archive gc.war
There are no user comments.
Please send ideas and opinions by email at alexamies@gmail.com.
© 2006-2007 Alex Amies