Approaches to Web Development for Bioinformatics
On this page:
This section discusses the basics of the Java language with a bioinformatics flavor.
The section Web Programming with Java discusses developing
web user interfaces with Java and the section BioJava
discusses the BioJava open source project. Several other sections give Java examples as
well, including the sections AJAX and
Accessing a Relational Database with Java.
The Sun Microsystems java web site16 is a great
place to begin studying Java. You can freely download the latest Java
Development Kit (JDK) and work your way through the Java Tutorial.
Here is a Java program that does the same thing as our Hello
World Perl
program. The file is in Hello.java.
public class Hello {
public static void main(String[] argv) {
System.out.println("Hello World!");
}
}
Compile it by entering the command. This will generate a
.class file in the same directory.
> javac Hello.java
Run the program by entering the command
> java Hello
The first thing you may notice comparing it to our Perl Hello World
program is that it takes three lines to write instead of one, ignoring
the comments in the Perl program. All Java programs must be
encapsulated in a class (Hello). Procedural code is
usually encapsulated in a method (main). There are
also concepts of class and method visibility (public). There is
the concept of static versus instance in the main method
signature. The method return type (void) must be
specified even if it is not used. Finally, you have to get an
object (System) to get an output stream (out) to print the
text with the platform method println().
Although it seems more complex than the Perl example, all
these concepts have been developed with code reuse in mind. Let's look
at an example that
is similar to the regular expression example with Perl that was used to
validate that all the symbols in a DNA string were valid
nucleotides. The file is Validate.java.
import
java.util.regex.Matcher;
import
java.util.regex.Pattern;
/**
* Class to demonstrate validation of
an
input string to make
* sure all symbols are valid nucleotides.
*/
public class
Validate {
/** The entry point
for the program
* @param argv
The first and only command line argument is the string to validate
*/
public
static void main(String[] argv) {
Pattern pattern =
Pattern.compile("[^atcg]",
Pattern.CASE_INSENSITIVE);
Matcher matcher =
pattern.matcher(argv[0]);
if (matcher.find()) {
System.out.println("Invalid
symbol " + matcher.group() +
"
found at position " + matcher.start() + ".");
}
}
}
Compile the class with the command
>javac Validate.java
Test the program with similar input to the Perl script:
>java Validate atcgNSTOP
The output is similar to Perl script:
Invalid symbol N found at position 4.
There are couple of points are immediately obvious:
- The regular expression is not part of the basic syntax of the
language but an argument to
compile() method of the class
Pattern. It is more verbose than the Perl
example.
- Evaluation of the regular expression is a two step process rather
than the one step in Perl but the result is still the same.
- The comments start with
/**. This allows
generation of HTML describing the class with the JavaDoc tool.
The @param symbol is used to describe arguments to methods and is
copied into the generated HTML.
The language constructs described above allow programmers to
better encapsulate their code within an application programming
interface (API). The exact interfaces that are exposed to users
of the API is can be controlled to be the minimum needed. Users
of the API are constrained to use it in a particular way, according the
Java types in the interface and the Java platform comes with tools to
document the use API's in HTML. This creates some programming
overhead but it has been thought worth it to many
large scale software development projects both for the development of
products and in-house systems for all kinds of businesses.
Finally, to complete our comparison of Perl with Java, let's look at
an example that translates an RNA sequence to an amino acid sequence.
The Java source is in Translate.java.
import java.util.HashMap;
import
java.util.Map;
/**
* Class demonstrates translation of
an RNA sequence into an amino
acid sequence.
*/
public class
Translate {
// RNA String to
translate
private
static final String RNA =
"auggcacaggcacuguugguacccccaggaccugaaagcuuccgccuuuuuacuaga";
// Map to store the
codons to an amino acid
sequence
private
static final Map<String ,String>
TRANSLATION = new
HashMap<String ,String>();
// Codons to
translate
private
static final String[] CODONS = {
"uuu",
"uuc", "uua",
"uug",
"ucu",
"ucc", "uca",
"ucg",
"uau",
"uac", "uaa",
"uag",
"ugu",
"ugc", "uga",
"ugg",
"cuu",
"cuc", "cua",
"cug",
"ccu",
"ccc", "cca",
"ccg",
"cau",
"cac", "caa",
"cag",
"cgu",
"cgc", "cga",
"cgg",
"auu",
"auc", "aua",
"aug",
"acu",
"acc", "aca",
"acg",
"aau",
"aac", "aaa",
"aag",
"agu",
"agc", "aga",
"agg",
"guu",
"guc", "gua",
"gug",
"gcu",
"gcc", "gca",
"gcg",
"gau",
"gac", "gaa",
"gag",
"ggu",
"ggc", "gga",
"ggg"
};
// Amino acid in map
private
static final String[] AMINO_ACIDS = {
"F",
"F", "L",
"L",
"S",
"S", "S",
"S",
"Y",
"Y", "--STOP--",
"--STOP--",
"C",
"C", "--STOP--",
"W",
"L",
"L", "L",
"L",
"P",
"P", "P",
"P",
"H",
"H", "Q",
"Q",
"R",
"R", "R",
"R",
"I",
"I", "I",
"M",
"T",
"T", "T",
"T",
"N",
"N", "K",
"K",
"S",
"S", "R",
"R",
"V",
"V", "V",
"V",
"A",
"A", "A",
"A",
"D",
"D", "E",
"E",
"G",
"G", "G", "G"
};
// initialize the
map
private
static void init() {
for (int i=0;
i<CODONS.length; i++) {
TRANSLATION.put(CODONS[i],
AMINO_ACIDS[i]);
}
}
/** The entry point
for the program
* @param argv No command line
arguments are used
*/
public static void
main(String[] argv) {
init();
StringBuffer
aminoAcidSequence = new StringBuffer();
int i = 0;
while (i <
RNA.length()) {
aminoAcidSequence.append(TRANSLATION.get(RNA.substring(i,
i+3)));
i
+= 3;
}
System.out.println("Amino
acid string: " + aminoAcidSequence);
}
}
The program can be compiled and run with the commands
>javac Translate.java
>java Translate
Amino acid string: MAQALLVPPGPESFRLFTR
Although the program is more verbose that the Perl equivalent there
are several notable things:
-
Because of the Java type system a variable can be declared
as more general type.
-
The Map template ensures
that the correct things are put into and got out of the map. This is
shown by the fragment
Map<String, String> TRANSLATION
= new HashMap<String, String>().
A number of tools for Java development are included with the basic
Java Development Kit (JDK). These include the compiler javac,
the documentation tool javadoc, the Java ARchiving
(compression) tool jar, and performance
instrumentation. In addition, there are
probably more freely available development tools than any other
platform, including
-
Integrated development environments, such as Eclipse and Forte (NetBeans),
that include graphics debuggers, command line completion, syntax
coloring, static code checking, and code review assistants, among a
huge number of other features
- The Junit unit testing framework
- Aspect oriented programming tools, such as AspectJ
There are also a number of commercial tools that can be useful for
web development and performance testing.
Java has been the language of choice for beginning computing
sciences courses at many universities and for many companies in project
and product development. It is a very well designed language and
it is a pleasure (for me, at least) to program in. However, it
has a number of disadvantages
-
It is not as quick to pick up as Perl and other scripting
languages. This can be a significant barrier to people who are
not software engineering professionals or who do not have a team of
software engineering professionals at their disposal for programming
tasks. A large proportion of fundamental aspects of the language
have been introduced to scale for large programming teams and for ease
of maintenance. However, these language features can be a barrier
to more casual programmers.
-
Although it often is the choice for large projects and products with web
interfaces, the minimum cost and barrier of entry can be higher than
other platforms, such as Perl / CGI, PHP, and Microsoft .NET.
-
Java is inheretly at least a little slower than C programs and the use of a garbage
collector to clean up memory does add some overhead. A more
significant problem is in the startup time. Java Virtual Machines
(JVM's) have become larger and more bloated over the years resulting in
slower startup times and larger footprints. A bigger problem,
however, is the programming culture. Java culture has often
focused on style and capability at the expense of performance.
One case in point are Enterprise Java Beans (EJB's), which are
notoriously complex, slow, and do not add the value promised. All
that said, Java is an efficient platform when programs are
constructed with care.
The availability of open source projects in Java is also an
important factor. There are a number of open source
bioinformatics projects, such as BioJava, and there are a huge number
of other open source Java projects of all kinds.
See the page Java Resources on this
web site for a list of popular and user suggested online resources.
There are no user comments.
Please send ideas and opinions by email at alexamies@gmail.com.
© 2006-2007 Alex Amies