execution of perl scripts in internet pages (apache … · execution of perl scripts in internet...

7
Apache Perl CGI Bioinformatics II - Tools for Functional Genomics Perl CGI Jul-2008 C1 Execution of Perl scripts in internet pages (Apache web server, Windows XP) The power of Perl scripts can also be used by embedding them into web pages, presented by a web server like Apache. The Perl code can be executed using the CGI (Common Gateway Interface) module of the web server to generate dynamically new web pages. To execute CGI scripts from inside web pages for e.g. calculations, the inclusion of scripts (Include module) must be enabled to allow use of "server-sided includes" (SSI). Security: The possibility to use server sided-includes and the execution of CGI scripts provides additional targets for hacker attacks. They are normally disabled. Compare the Apache security tips: http://httpd.apache.org/docs/2.2/misc/security_tips.html 1. Configuration of web server Apache We chose the last stable Apache-Version Apache 2.2.9, and AciveState Perl 5.10.0.1003. Apache can best be installed as part of a WAMPP system (Windows, Apache, MySQL, PHP, phpMyAdmin) as described in WAMPP_install.pdf, chapter 2. The Perl distribution is installed as described in perl_install_xp.pdf. The configuration file is httpd.conf in "C:\Program Files\Apache Software Foundation\Apache2.2\conf" The web pages presented by the Apache server are located in the \htdocs directory: "C:\Program Files\Apache Software Foundation\Apache2.2\htdocs" The Perl scripts executed by the Apache web server are located in: "C:\Program Files\Apache Software Foundation\Apache2.2\cgi-bin" For enabling CGI-scripts the configuration file httpd.conf in the folder "C:\Program Files\Apache Software Foundation\Apache2.2\conf" has to be edited: 1a. To load the respective Apache modules at start-up the comment sign "#" has to be removed from the LoadModule section for the cgi_module and include_module, if not already done: …. LoadModule cgi_module modules/mod_cgi.so ….. LoadModule include_module modules/mod_include.so ….

Upload: dohanh

Post on 10-Sep-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Apache Perl CGI Bioinformatics II - Tools for Functional Genomics

Perl CGI Jul-2008 C1

Execution of Perl scripts in internet pages (Apache web server, Windows XP)

The power of Perl scripts can also be used by embedding them into web pages, presented by a web server like Apache. The Perl code can be executed using the CGI (Common Gateway Interface) module of the web server to generate dynamically new web pages. To execute CGI scripts from inside web pages for e.g. calculations, the inclusion of scripts (Include module) must be enabled to allow use of "server-sided includes" (SSI).

Security: The possibility to use server sided-includes and the execution of CGI scripts provides additional targets for hacker attacks. They are normally disabled. Compare the Apache security tips:

http://httpd.apache.org/docs/2.2/misc/security_tips.html 1. Configuration of web server Apache

We chose the last stable Apache-Version Apache 2.2.9, and AciveState Perl 5.10.0.1003. Apache can best be installed as part of a WAMPP system (Windows, Apache, MySQL, PHP, phpMyAdmin) as described in WAMPP_install.pdf, chapter 2. The Perl distribution is installed as described in perl_install_xp.pdf.

The configuration file is httpd.conf in "C:\Program Files\Apache Software Foundation\Apache2.2\conf"

The web pages presented by the Apache server are located in the \htdocs directory: "C:\Program Files\Apache Software Foundation\Apache2.2\htdocs"

The Perl scripts executed by the Apache web server are located in: "C:\Program Files\Apache Software Foundation\Apache2.2\cgi-bin"

For enabling CGI-scripts the configuration file httpd.conf in the folder "C:\Program Files\Apache Software Foundation\Apache2.2\conf" has to be edited:

1a. To load the respective Apache modules at start-up the comment sign "#" has to be removed from the LoadModule section for the cgi_module and include_module, if not already done:

…. LoadModule cgi_module modules/mod_cgi.so ….. LoadModule include_module modules/mod_include.so ….

Apache Perl CGI Bioinformatics II - Tools for Functional Genomics

Perl CGI Jul-2008 C2

1b. The usage settings of the web pages root directory /htdocs and the CGI script

directory /cgi-bin directory should both be enabled for execution of includes and the /cgi-bin directory for execution of CGI scripts. The "Options" parameter has to be modified by adding the "+Includes" and "+ExecCGI" options as described in: http://httpd.apache.org/docs/2.2/en/mod/core.html#options

<Directory "C:/Program Files/Apache Software

Foundation/Apache2.2/htdocs"> …. Options Indexes FollowSymLinks +IncludesNOEXEC …. </Directory>

….

<Directory "C:/Program Files/Apache Software Foundation/Apache2.2/cgi-bin">

… Options +Includes +ExecCGI … </Directory>

Security: This configuration allows CGI scripts only to be executed, when located inside the "\cgi-bin" directory. The option "IncludesNOEXEC" will not allow only include calling routines not starting an executable script directly by using "exec cgi" or "exec cmd". Scripts can only be called using "include virtual" from a dedicated script directory "\cgi-bin", remapped by the "ScriptAlias" directive of the httpd.conf file. The script directory is not a subdirectory of the web pages root directory "/htdocs" and is therefore only available by the remapping and by "include virtual" calls.

1c. Check the correct setting of the directory for the CGI scripts (ScriptAlias). The indicated alias "/cgi-bin/" directory gets pointed to a specific script directory:

<IfModule alias_module> … ScriptAlias /cgi-bin/ "C:/Program Files/Apache Software Foundation/Apache2.2/cgi-bin/"

</IfModule>

1d. The extension of Perl scripts ".pl" should be added to the cgi-script MIME

handler for "cgi-script" and the comment sign "#" should be removed. Further down a MIME handler "text/html" is added for the extension ".shtml" and an output filter to enable scanning for includes for this file type is switched on. The comment signs "#" should be removed:

Apache Perl CGI Bioinformatics II - Tools for Functional Genomics

Perl CGI Jul-2008 C3

<IfModule mime_module> … AddHandler cgi-script .cgi .pl … AddType text/html .shtml AddOutputFilter INCLUDES .shtml … </IfModule>

1e. The modified httpd.conf file is saved and the Apache web server has to be stopped and restarted to read the new configuration settings. By double clicking on the Apache Monitor Tool you can open the Monitor window. Here you can "stop" and "start" the httpd server.

Security and performance: Includes are only executed from special .shtml files and not searched for in normal .html or other files. This also frees the web server from scanning every file for includes.

2. Modification of Perl scripts

Any Perl script should contain an absolute first "she-bang" line, not indented, starting with the she-bang symbol "#!" and followed by the directory of the Perl executable program. The flag "-w" can be used to allow extended error reporting.

#!C:/Program Files/Perl/bin/perl.exe -w

Using the Perl module CGI::Carp allows sending error messages produced during execution of the Perl script to the Browser Window:

use CGI::CARP qw(fatalsToBrowser);

3. Calling the Perl script from a HTML web site

From an HTML file the Perl CGI script is called

1) either directly as a <a href> web link, which deletes the old page and opens a new one interpreting the output of the Perl scripts "print" commands as HTML text and commands:

<a href="/cgi-bin/printenv2.pl">Environment variables</a>

2) as an included part of a web page using the call "#exec cgi" for a script not receiving additional parameter values from the calling web site or using "#include virtual" for scripts which may use parameters delivered by the calling web site.

Apache Perl CGI Bioinformatics II - Tools for Functional Genomics

Perl CGI Jul-2008 C4

The calls are surrounded by HTML comment tags: ("<!-- … -->"). If the directory

option "IncludeNOEXEC" was used in httpd.conf, only "include virtual" is possible for scripts in a directory defined by the httpd.conf option "ScriptAlias":

<!--#exec cgi="/cgi-bin/printenv2.pl" -->

<!--#include virtual="/cgi-bin/printenv2.pl" -->

4. Test-HTML for executing the Perl script

Following HTML-code can be copied as a file "showenv.shtml" into the web server root directory "C:\Program Files\Apache Software Foundation\Apache2.2\htdocs". It will call the Perl script "printenv2.pl" shown below, which should be saved in the web server cgi-bin folder: "C:\Program Files\Apache Software Foundation\Apache2.2\cgi-bin".

The web site can be called by targeting http://localhost/showenv.shtml with your browser. The Perl script "printenv2.pl" will list the environment variables. For demonstration it is called directly from the page as an include using "include virtual" or it is executed by clicking on the link "Environment" producing a new page.

HTML code showenv.shtml: <html> <head> <title>Environment</title>

<style type="text/css"> body { font-family:HELVETICA, ARIAL, sans-serif; font-size:9pt; line-height:12pt; } </style> </head>

<body> <a href="/cgi-bin/printenv2.pl">Environment variables</a><br><br>

<!--#echo var="SERVER_NAME" --><br><br> <!--#config timefmt="%d.%m.%Y, %H.%M" -->

Apache Perl CGI Bioinformatics II - Tools for Functional Genomics

Perl CGI Jul-2008 C5

<!--#echo var="DATE_LOCAL" --><br><br> <!--#include virtual="/cgi-bin/printenv2.pl" --> </body>

</html>

Perl script printenv2.pl:

#!C:/Program Files/Perl/bin/perl.exe -w

# printenv2 -- CGI program printing its environment

use strict;

# send error messages to the browser use CGI::CARP qw(fatalsToBrowser);

# set MIME type text/html and character set print "Content-type: text/html; charset=iso-8859-1\n\n";

# Printing the environment variables to the calling HTML page # Printing is done one for one in a foreach loop # the hash %ENV contains the environment settings # sort(keys %ENV) sorts the keys (parameter names) of the environment hash

# $_ is the actual environment parameter name loaded # $ENV{$_} is the value of the actual environment parameter

foreach (sort(keys %ENV)) { print $_ . " = " . "\"$ENV{$_}\"" . "<br>\n";

}

exit;

Apache Perl CGI Bioinformatics II - Tools for Functional Genomics

Perl CGI Jul-2008 C6

5. Example: Perl generating web pages with Apache - DNA analysis

The following HTML page dnamw.html will collect DNA data from the user and call the Perl script dna1.pl inside the /cgi-bin folder to analyse the DNA and calculate the molecular weight.

Web service to show base composition, CG-content and molecular weight → dnamw.html → /cgi-bin/dna1.pl

Copy the file dnamw.html to the Apache \htdocs directory and dna1.pl to the \cgi-bin directory. Call the web page by using Firefox with following link dependent on your web server port (80 or 8080):

http://localhost/dnamw.html or http://localhost:8080/dnamw.html

Fill the entry fields with data and press "Analyze".

The data are given to the Perl script by the HTML POST method, which codes them inside the calling string. Special characters inside the data are formatted as "%hexcode". dna1.pl has to retranslate the hex-coded data to the original characters before analysing them. The HTML has a <form></form> section which is bound to the Perl script by POST method by its action and method modifiers. It is activated by pressing the submit button "Analyze" and will send the data copied into the textfiled "Name:" and the text-area " DNA Sequence (plain or FASTA):"

<form action="/cgi-bin/dna1.pl" method="post" name="input"> <p>Name:<br><input type="text" name="name" size="40"

maxlength="40"></p>

<p>DNA Sequence (plain or FASTA):<br><textarea rows="5" cols="50"

name="sequence"></textarea></p>

<p><input type="submit" value="Analyze"></p>

</form>

The POST string received by the Perl program from STDIN uses "&" as field separators, the above in the HTML code underlined filed names and the field content associated by an equal sign "=" to the tags: …&name=testdna&sequence=ACGTTAT….

A common strategy used also in this script is to collect the data from STDIN (which is here not the keyboard but the web server Apache), split the field-value pairs into an array using the "&" as split-indicator, and splitting the array contents step for step into an hash with field names as keys and the field content as hash values. The hex-coded special characters are recoded to normal characters. Special characters which should be shown on the web site have to be encoded by HTML ampersands, e.g. the FASTA sequence start symbol "<" as "&gt;" most important as an HTML tag open marker.

Apache Perl CGI Bioinformatics II - Tools for Functional Genomics

Perl CGI Jul-2008 C7

Compare at SELFHTML: http://de.selfhtml.org/html/referenz/zeichen.htm#benannte_html

Apache sends FASTA ">" as hex code: %62 Perl decodes hex code to normal characters: > Apache gets the ">" coded as ampersand: &gt;

The hex-decoding and ampersand-coding is done by following constructs:

$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; $value =~ s/>/&gt;/g;

The yellow labelled pattern in parentheses is caught as $1 (hex number without prefixed %) in the search pattern and is recalculated in the substitution pattern (e modifier allows functions!) - the function hex recodes hex to ASCII numbers - the recoded hex number is "packed" by pack as one character of class "C" (normal characters).

Next, if there is a FASTA header line inside the sequence present in the hash %fileds_values key entry "sequence", it is separated from the sequence:

if ($fields_values{sequence} =~ /(^\&gt\;[^\n]*)\n([A-Za-z]*)/){ $fasta_header = $1; $dna_sequence = $2; }

The first parentheses set searches our already ampersand-recoded "<" FASTA start symbol at the beginning (^). The & and semicolon are meta characters in regular expressions and have to be masked by a backslash, followed by zero to many (quantifier *) non-line-breaks [^\n].

Attention: The caret ^ in cornered brackets says: "not" the following alternatives.

The output HTML page is just printed using print into STDOUT, which is still Apache and not the screen:

print "Content-type: text/html\n\n"; print '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">', "\n"; print "<html>\n"; print "<head>\n"; ........