Approaches to Web Development for Bioinformatics

Previous  Contents  Next
References

Processing Sequence Data

In this section I will give an example of processing DNA sequence data from a FASTA file in a web application. The example was inspired by the GCContent program in the BioJava tutorial. It shows that it is easy to adapt a command line example to a web environment. All you have to do is to add on the web user interface. The example also demonstrates use of the Apache Commons FileUpload component60 to upload the FASTA format file.

The example starts with a JSP file to display a form prompting the user to upload a FASTA file.


<%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
</head>
<body>
<h1>FASTA File Processing Example</h1>
<%
String errorMessage = (String)request.getAttribute("errorMessage");
Double gcContent = (Double)request.getAttribute("GCContent");
if (errorMessage != null) {
%>
<p><%=errorMessage%></p>
<p>Upload another file:</p>
<%
} else if (gcContent != null) {
java.text.NumberFormat nf = java.text.NumberFormat.getInstance();
nf.setMaximumFractionDigits(0);
String gcString = nf.format(gcContent.doubleValue());
%> <p>The GC Content of your file is <%=gcString%> percent.
<%=request.getAttribute("Annotation")%></p>
<p>Upload another file:</p>
<%
} else {
%>
<p>Please upload a FASTA file to find out the GC content:</p>
<%
}
%>
<form method="post" action="gccontent" enctype="multipart/form-data">
<div>
<input type="file" name="uploaded_file"/><br/><br/>
<input type="submit" name="Upload" value="Upload" />
</div>
</form>
</body>
</html>

Before displaying the form the JSP checks to see if there is already some information in the HTTP request object. This is data that the Servlet below will forward to the JSP. The first possible piece of data is an error message. This may be generated when processing an uploaded file. The next possible piece of data is the GC (guanine-cytosine base pair) content of the DNA sequence. If a number is present then it is formatted with the NumberFormat class to prevent display of too many decimal digits. The third possible piece of information is a description of the sequence uploaded.

The JSP page has an enctype of multipart/form-data. This indicates that it will contain more than just a request for a new page: an additional 'part' will be appended to the HTTP request as well. The action of the form is important. It is the relative URL of the Servlet that will process the request. The only field in the form is the upload widget of type "file". Here is what the page looks like when it is first displayed.

Screenshot when FASTA File Upload Page is First Displayed

Here is the Servlet that processes the request. The code is in file GCContentServlet.java.


package net.medicalcomputing.web;

import java.io.*;
import java.util.*;
import java.util.logging.*;

import javax.servlet.*;
import javax.servlet.http.*;

import org.apache.commons.fileupload.FileItem;
import org.apache.commons.fileupload.FileItemFactory;
import org.apache.commons.fileupload.FileUploadException;
import org.apache.commons.fileupload.disk.DiskFileItemFactory;
import org.apache.commons.fileupload.servlet.ServletFileUpload;

import org.biojava.bio.BioException;
import org.biojava.bio.seq.DNATools;
import org.biojava.bio.symbol.Symbol;
import org.biojavax.bio.seq.RichSequence;
import org.biojavax.bio.seq.RichSequenceIterator;

/**
* Servlet Demonstrates use of the BioJava library to compute the GC Content.
* Uses Apache Commons FileUpload component to upload a FAST format file.
*/
public class GCContentServlet extends HttpServlet implements Servlet {

/*
* Processes the POST method
*/
protected void doPost(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {

String errorMessage = null;

// Create a new file upload handler
FileItemFactory factory = new DiskFileItemFactory();
ServletFileUpload upload = new ServletFileUpload(factory);
try {
List items = upload.parseRequest(request);
Iterator it = items.iterator();
FileItem item = (FileItem)it.next();
if (!item.isFormField()) {
// Print info about file uploaded to the log
Logger.getLogger("net.medicalcomputing").log(Level.INFO, "Uploaded file info:" +
"\nField Name:\t" + item.getFieldName() +
"\nFile Name:\t" + item.getName() +
"\nContent Type:\t" + item.getContentType() +
"\nIs In Memory:\t" + item.isInMemory() +
"\nSize (Bytes):\t" + item.getSize() +
"\n");
InputStream uploadedStream = item.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(uploadedStream));
RichSequenceIterator sequence = org.biojavax.bio.seq.RichSequence.IOTools.readFastaDNA(br, null);
if (sequence.hasNext()) {
RichSequence seq = sequence.nextRichSequence();
int gc = 0;
for (int
pos = 1; pos <= seq.length(); ++pos) {
Symbol sym = seq.symbolAt(pos);
if (sym == DNATools.g() || sym == DNATools.c())
++gc;
}
double gcContent = (gc * 100.0) / seq.length();
request.setAttribute("GCContent", new Double(gcContent));

// Put annotation info in request
String annotation = "";
if (seq.getAnnotation() != null) {
annotation = "Information about your sequence: " + seq.getDescription();
}
request.setAttribute("Annotation", annotation);
}
}
} catch (FileUploadException e) {
errorMessage = "Error uploading file: " + e.getMessage();
Logger.getLogger("net.medicalcomputing").log(Level.WARNING, e.getMessage(), e);
} catch (NoSuchElementException e) {
errorMessage = "Error processesing file: " + e.getMessage();
Logger.getLogger("net.medicalcomputing").log(Level.WARNING, e.getMessage(), e);
} catch (BioException e) {
errorMessage = "Error processesing file: " + e.getMessage();
Logger.getLogger("net.medicalcomputing").log(Level.WARNING, e.getMessage(), e);
}

// Set the error message in the request
request.setAttribute("errorMessage", errorMessage);

// Forward back to the JSP to display the data
getServletContext().getRequestDispatcher("/index.jsp").forward(request, response);
}
}

After importing the relevant java and Servlet libraries the Servlet imports the Apache Commons FileUpload libraries and finally the BioJava libraries. Using * as a wildcard on import statements can save space but it is a good practice to write out the full import for each class for clarity to help people who have to read your code.

The Servlet use the DiskFileItemFactory to help in uploading the file. This will save a temporary copy of the uploaded file in order to prevent the application server to have to hold the entire file in memory. After getting a reference to the uploaded FileItem. The Servlet logs some information out about the file being uploaded. This information is logged to the application server log files.

The critical lines of the application are


InputStream uploadedStream = item.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(uploadedStream));
RichSequenceIterator sequence = org.biojavax.bio.seq.RichSequence.IOTools.readFastaDNA(br, null);

This is where an InputStream is obtained from the uploaded file and that is fed into the BioJava RichSequenceIterator via a BufferedReader. Having connected the uploaded file to the BioJava API we can then use the later to extract information from the DNA sequence. The GC content is computed and the description in the FASTA file is extracted.

The information extracted from the sequence is then placed in the request object and the request is forwarded back to the Java Server Page index.jsp to display the page using the RequestDispatcher. The Java code has been somewhat separated from the HTML code. There are frameworks to reduce this mixing further but they can be a task to learn in themselves. Here is the screen after uploading the file HD.txt.

Screenshot when FASTA File has been Uploaded

You will need several libraries to compile the code:

The web application descriptor file web.xml also declares a listener to clean up the temporary files left from uploading. This is part of the Apache FileUpload component. If you have trouble putting together the web application you can always use my web application archive gc.war


Previous  Contents  Next
References

Contributed Comments and NotesAdd a comment.

There are no user comments.

Google

Please send ideas and opinions by email at alexamies@gmail.com.

© 2006-2007 Alex Amies