Approaches to Web Development for Bioinformatics

Previous  Contents  Next
References

In this section:

Web Servers

A web server receives and responds to Hypertext Transfer Protocol (HTTP) requests. Web servers are actually better called HTTP servers. Developing a web application you will be most concerned how your application programming language can interface with the web server. That will be discussed for several programming languages in the following sections. However, there are a number of aspects that are common to many application programming languages and some that are particular to certain web servers. Some of these aspects are file permissions and access, authentication, and encryption. The individual web servers and those common aspects are the subject of this section.

The two most commonly used web servers are the Apache HTTP Server and Microsoft Internet Information Server. There are also a number of others on the market. Java application servers usually include their own web server but also provide plug-ins that support integration with other web servers. It can be a good idea to use a web server other than the Java application server to take advantage of a more production ready server for security purposes. Many development environments, including Eclipse and Microsoft Visual Studio, have built-in web servers making it less necessary for developers to be familiar with production web servers.

Regardless of the web server you choose, two files that you will probably want to have in the root document directory are robots.txt and favicon.ico. The file robots.txt tells web crawlers where to look and where not to look to index the files on your server. A basic robots.txt file is


# /robots.txt file

User-agent: *
Disallow: cgi-bin/

This tells the web crawler that they should look anywhere except the directory cgi-bin.

The file favicon.ico is the icon that browsers place at the left hand side of the address bar and on tabs. You can create one with a graphics program, such as GIMP. Some browser (mostly IE) automatically look for the favicon.ico file in the same directory as the page is served from. You can also specifically direct browsers to look for the icon file using the HTML header

HTML

<link rel="shortcut icon" href="/favicon.ico"/>

Apache HTTP Server

The Apache HTTP Server is reportedly the most commonly used web server. It is a favorite of Internet Service Providers (ISP's) and of the open source community. The Apache HTTP Server can be freely downloaded from the Apache site51. There are versions of the server for many different operating systems, including Windows with an easy installation program. My environment at present Apache 2.2.4 with Windows for development and an older version of Apache on Linux in production.

The Apache executable is called httpd in the bin directory under the Apache installation tree. On Windows you can start it via a Windows server using the convenient Apache icon in the system tray. The configuration file is httpd.conf in the conf directory. The first thing that you will likely want to do is to change the document root, which is the location of the file served up in response to HTTP request. To do that modify the DocumentRoot property and add or modify a <Directory> element.

By default the Apache log files access.log and error.log will be written to the logsdirectory. The first thing that you will probably notice are 404 File Not Found entries for robots.txt and favicon.ico.

Apache modules are a way to add functionality to the basic web server. The Common Gateway Interface (CGI) and PHP web platforms discussed later in this article are Apache modules. Other useful modules are available for authorization, virtual hosts, logging and URL mapping.

Setting up The Common Gateway Interface

The Common Gateway Interface (CGI) is discussed in the section Web User Interfaces of this article. Here I am including some notes on setting it up for Apache. See the Apache Tutorial: Dynamic Content with CGI and Using Apache with Microsoft Windows51.

To set up CGI on Apache you need to load the CGI module with a line like htis in httpd.conf:

Apache Configuration File Fragment

LoadModule cgi_module modules/mod_cgi.so

It is a good practice to set a script alias (a directive of the mod_alias module, discussed below) to prevent scripts being accessed directly as files. This can be done with a line like this in httpd.conf:

Apache Configuration File Fragment

ScriptAlias /cgi-bin/ "C:/cgi-bin/"

Forward slashes are used on Windows as if it was UNIX. A directory stanza is also needed. This should look something like

Apache Configuration File Fragment

<Directory "C:/cgi-bin">
AllowOverride None
Options None
Order allow,deny
Allow from all
</Directory>

On UNIX and Linux the first line of a script usually starts with #! to let the server know what program to use to execute the script. On Windows it is convenient to have set the ScriptInterpreterSource to allow look up of the execution program in the Windows regisry. You can do that with a line like

Apache Configuration File Fragment

ScriptInterpreterSource registry

URL Mapping and REST Patterns

The mod_alias and mod_rewrite modules are designed for URL mapping. The mod_alias module is mostly suitable for mapping URL's to files that can exist either on your sever or somewhere else. The mod_alias module can map URLs that match regular expressions to either files or scripts that can generate dynamic content These can be useful in mapping REST patterns to scripts that process requests and dynamically generate data without requiring users to see the unfriendly URL's involved. See the section Application Integration for a discussion of REST.

To enable the rewrite module add or uncomment this line in the httpd.conf file

Apache Configuration File Fragment

LoadModule rewrite_module modules/mod_rewrite.so

As an example, to enable the mod_rewrite module to map the URL's of the form http://host/gene/symbol where symbol is an arbitrary symbol for a gene to a script that dynamically generates content about the gene you could use this rewriting rule.

Apache Configuration File Fragment

<IfModule rewrite_module>
RewriteEngine on
RewriteRule ^/gene/?(.*) /cgi-bin/geneinfo.pl?symbol=$1 [QSA]
</IfModule>

The stanza is only used if the rewrite_module is loaded. It turns the rewrite engine on. Any URL matching the regular expression is mapped to the script geneinfo.pl in the directory gene with the symbol of the gene appended to the query string. In the regular expression

For example, the URL http://localhost/gene/HD is mapped to http://localhost/gene/geneinfo.php?symbol=HD. The QSA (query string append) flag is intended for rules that modify the query string.


Previous  Contents  Next
References

Contributed Comments and NotesAdd a comment.

There are no user comments.

Google

Please send ideas and opinions by email at alexamies@gmail.com.

© 2006-2007 Alex Amies